From 796c461c466a85db73bac986252c67635ef2c5ff Mon Sep 17 00:00:00 2001 From: Andy Stark Date: Thu, 14 May 2026 13:06:07 +0100 Subject: [PATCH 1/4] DOC-6619 draft of Python example --- content/develop/use-cases/streaming/_index.md | 135 +++ .../use-cases/streaming/redis-py/_index.md | 410 +++++++++ .../streaming/redis-py/consumer_worker.py | 167 ++++ .../streaming/redis-py/demo_server.py | 826 ++++++++++++++++++ .../streaming/redis-py/event_stream.py | 321 +++++++ 5 files changed, 1859 insertions(+) create mode 100644 content/develop/use-cases/streaming/_index.md create mode 100644 content/develop/use-cases/streaming/redis-py/_index.md create mode 100644 content/develop/use-cases/streaming/redis-py/consumer_worker.py create mode 100644 content/develop/use-cases/streaming/redis-py/demo_server.py create mode 100644 content/develop/use-cases/streaming/redis-py/event_stream.py diff --git a/content/develop/use-cases/streaming/_index.md b/content/develop/use-cases/streaming/_index.md new file mode 100644 index 0000000000..933db116ee --- /dev/null +++ b/content/develop/use-cases/streaming/_index.md @@ -0,0 +1,135 @@ +--- +categories: +- docs +- develop +- stack +- oss +- rs +- rc +description: Process ordered event streams with consumer groups, replay, and configurable retention. +hideListLinks: true +linkTitle: Streaming +title: Redis streaming +weight: 5 +--- + +## When to use Redis streaming + +Use Redis streaming when you need to process and deliver ordered event streams — user actions, telemetry, transactions, inter-service messages — with consumer groups, replay, and configurable retention, without standing up a dedicated streaming platform. + +## Why the problem is hard + +Continuous event flows pushed through primary databases or ad-hoc queues add latency on the request path, make backpressure hard to control, and tightly couple producers to consumers. Some of the obvious workarounds have real drawbacks: + +- **A dedicated streaming platform** (Kafka, Pulsar) solves all of this but adds significant + operational overhead — separate clusters, partition management, consumer rebalancing — that's + disproportionate when retention windows are hours or days, not months. +- **Pub/sub** ([Redis Pub/Sub]({{< relref "/develop/pubsub" >}}), MQTT) is fire-and-forget + transport: messages are delivered to whoever is connected and discarded, with no persistence, + replay, or consumer tracking. +- **Polling a primary database for new rows** generates constant load on the system of record, + struggles to order events from concurrent writers, and offers no replay or per-consumer cursor. + +A workable streaming layer needs an ordered, durable log, independent consumer tracking with +acknowledgment, at-least-once delivery, and retention controls — all without introducing a +separate broker for moderate-scale workloads. + +This pattern is distinct from [pub/sub]({{< relref "/develop/use-cases/pub-sub" >}}), which is +at-most-once transport with no history: a subscriber that's offline when a message is published +misses it for good. It is also distinct from a +[job queue]({{< relref "/develop/use-cases/job-queue" >}}), where each task is claimed by exactly +one worker and discarded after it completes. Streaming retains the ordered history, so many +independent consumer groups can read the same events at their own pace and replay from any point. + +## What you can expect from a Redis solution + +You can: + +- Deliver ordered events to multiple independent consumer groups, each processing the full + stream at its own pace. +- Scale consumers horizontally within a group to share work across workers, with at-least-once + delivery and per-consumer tracking. +- Replay historical events for debugging, bootstrapping a new projection, or rebuilding a + downstream system from scratch. +- Bound memory by retaining events by length or by minimum ID, without a separate cleanup job. +- Recover unacknowledged entries from crashed consumers so no event sits invisibly in flight. +- Partition streams by tenant, region, or entity for load distribution and per-entity event + sourcing. +- Replace a dedicated Kafka deployment for moderate-scale, short-retention streaming workloads + using infrastructure you already run. + +## How Redis supports the solution + +In practice, producers append events to a stream with +[`XADD`]({{< relref "/commands/xadd" >}}) and Redis assigns each entry an auto-generated, +time-ordered ID. Consumers either read the stream directly with +[`XREAD`]({{< relref "/commands/xread" >}}), or join a *consumer group* and read with +[`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}), which gives every consumer a private +cursor and a pending-entries list of in-flight messages. Once a consumer finishes processing an +entry, it acknowledges it with [`XACK`]({{< relref "/commands/xack" >}}); entries left +unacknowledged past a timeout can be reassigned to a healthy consumer with +[`XCLAIM`]({{< relref "/commands/xclaim" >}}) or +[`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}). + +Redis provides the following features that make it a good fit for streaming: + +- [Streams]({{< relref "/develop/data-types/streams" >}}) + ([`XADD`]({{< relref "/commands/xadd" >}}), + [`XLEN`]({{< relref "/commands/xlen" >}})) provide an append-only log with auto-generated + time-ordered IDs, so ordering is intrinsic to the data structure rather than something the + application has to maintain. +- [Consumer groups]({{< relref "/develop/data-types/streams#consumer-groups" >}}) + ([`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}), + [`XACK`]({{< relref "/commands/xack" >}})) give at-least-once delivery with per-consumer + cursors and acknowledgment, so workers in a group share the stream's work and multiple groups + read the same stream independently. +- [`XRANGE`]({{< relref "/commands/xrange" >}}) and + [`XREVRANGE`]({{< relref "/commands/xrevrange" >}}) support replay and range queries — + bootstrap a new projection from the start of the stream, audit recent events, or run + point-in-time reads by ID range. +- [`XPENDING`]({{< relref "/commands/xpending" >}}), + [`XCLAIM`]({{< relref "/commands/xclaim" >}}), and + [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}) recover messages a crashed consumer + left in flight, so no event sits invisibly past its processing window. +- Retention controls — [`XADD ... MAXLEN ~ n`]({{< relref "/commands/xadd" >}}) and + [`XTRIM MINID ~ id`]({{< relref "/commands/xtrim" >}}) — bound stream size by length or by + oldest event, so memory stays bounded as the stream rolls forward. +- Sub-millisecond reads and writes from memory, so streaming runs on the same Redis instance + already handling cache, sessions, or rate limiting at zero marginal cost. + +## Ecosystem + +The following libraries and frameworks use Redis Streams for event-driven workloads: + +- **Java**: + [Spring Data Redis Streams](https://docs.spring.io/spring-data/redis/reference/redis/redis-streams.html) + for consumer-group processing with producer/consumer abstractions and pending-entries handling. +- **Node.js**: [`node-redis`](https://github.com/redis/node-redis) and + [`ioredis`](https://github.com/redis/ioredis) for stream producers and consumers in + event-driven APIs. +- **Python**: [`redis-py`](https://redis.readthedocs.io/) with + [FastAPI](https://fastapi.tiangolo.com/) or [Django](https://www.djangoproject.com/) for + microservice event pipelines. +- **Infrastructure**: + [Active-Active geo-distribution]({{< relref "/operate/rs/databases/active-active" >}}) on + Redis Enterprise / Redis Cloud for cross-region stream replication; + [Azure Managed Redis](https://azure.microsoft.com/en-us/products/managed-redis) with + [Azure Functions](https://azure.microsoft.com/en-us/products/functions) for serverless event + backbones. + +## Code examples to build your own Redis streaming pipeline + +The following guides show how to build a simple Redis-backed event stream with producers and +consumer groups. Each guide includes a runnable interactive demo that lets you produce events, +scale consumers within a group, replay history from any point, and watch independent groups +read the same stream at their own pace. + +* [redis-py (Python)]({{< relref "/develop/use-cases/streaming/redis-py" >}}) +* [node-redis (Node.js)]({{< relref "/develop/use-cases/streaming/nodejs" >}}) +* [go-redis (Go)]({{< relref "/develop/use-cases/streaming/go" >}}) +* [Jedis (Java)]({{< relref "/develop/use-cases/streaming/java-jedis" >}}) +* [Lettuce (Java)]({{< relref "/develop/use-cases/streaming/java-lettuce" >}}) +* [StackExchange.Redis (C#)]({{< relref "/develop/use-cases/streaming/dotnet" >}}) +* [Predis (PHP)]({{< relref "/develop/use-cases/streaming/php" >}}) +* [redis-rb (Ruby)]({{< relref "/develop/use-cases/streaming/ruby" >}}) +* [redis-rs (Rust)]({{< relref "/develop/use-cases/streaming/rust" >}}) diff --git a/content/develop/use-cases/streaming/redis-py/_index.md b/content/develop/use-cases/streaming/redis-py/_index.md new file mode 100644 index 0000000000..d48dbe217a --- /dev/null +++ b/content/develop/use-cases/streaming/redis-py/_index.md @@ -0,0 +1,410 @@ +--- +categories: +- docs +- develop +- stack +- oss +- rs +- rc +description: Implement a Redis event-streaming pipeline in Python with redis-py +linkTitle: redis-py example (Python) +title: Redis streaming with redis-py +weight: 1 +--- + +This guide shows you how to build a Redis-backed event-streaming pipeline in Python with [`redis-py`]({{< relref "/develop/clients/redis-py" >}}). It includes a small local web server built with the Python standard library so you can produce events into a single Redis Stream, watch two independent consumer groups read it at their own pace, and recover stuck deliveries with `XAUTOCLAIM` after simulating a consumer crash. + +## Overview + +A Redis Stream is an append-only log of field/value entries with auto-generated, time-ordered IDs. Producers append with [`XADD`]({{< relref "/commands/xadd" >}}); consumers belong to *consumer groups* and read with [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}), which gives each consumer a private cursor and a pending-entries list (PEL) of in-flight messages. Once a consumer has processed an entry it acknowledges it with [`XACK`]({{< relref "/commands/xack" >}}); entries left unacknowledged past an idle threshold can be reassigned to a healthy consumer with [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}). + +That gives you: + +* Ordered, durable history that many independent consumer groups can read at their own pace +* At-least-once delivery, with per-consumer pending lists and automatic recovery of crashed consumers +* Horizontal scaling within a group — add a consumer and Redis automatically splits the work +* Replay of any range with [`XRANGE`]({{< relref "/commands/xrange" >}}), independent of consumer-group state +* Bounded retention through [`XADD MAXLEN ~`]({{< relref "/commands/xadd" >}}) or + [`XTRIM MINID ~`]({{< relref "/commands/xtrim" >}}), without a separate cleanup job + +In this example, producers append order events (`order.placed`, `order.paid`, `order.shipped`, `order.cancelled`) to a single stream at `demo:events:orders`. Two consumer groups read the same stream: + +* **`notifications`** — two consumers (`worker-a`, `worker-b`) sharing the work, modelling a fan-out worker pool. +* **`analytics`** — one consumer (`worker-c`) processing the full event flow on its own. + +## How it works + +The flow looks like this: + +1. The application calls `stream.produce(event_type, payload)` which runs [`XADD`]({{< relref "/commands/xadd" >}}) with an approximate [`MAXLEN ~`]({{< relref "/commands/xadd" >}}) cap. Redis assigns an auto-generated time-ordered ID. +2. Each consumer thread loops on [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}) with the special ID `>` (meaning "deliver entries this group has not yet delivered to anyone") and a short block timeout. +3. After processing each entry, the consumer calls [`XACK`]({{< relref "/commands/xack" >}}) so Redis can drop it from the group's pending list. +4. If a consumer is killed (or crashes) before acking, its entries sit in the group's PEL. A periodic [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}) sweep reassigns idle entries to a healthy consumer. +5. Anyone — including code outside the consumer groups — can read history with [`XRANGE`]({{< relref "/commands/xrange" >}}) without affecting any group's cursor. + +Each consumer group has its own cursor (`last-delivered-id`) and its own pending list, so the two groups in this demo process the same events without coordinating with each other. + +## The event-stream helper + +The `RedisEventStream` class wraps the stream operations +([source](https://github.com/redis/docs/blob/main/content/develop/use-cases/streaming/redis-py/event_stream.py)): + +```python +import redis +from event_stream import RedisEventStream + +r = redis.Redis(host="localhost", port=6379, decode_responses=True) +stream = RedisEventStream( + redis_client=r, + stream_key="demo:events:orders", + maxlen_approx=2000, # retention guardrail + claim_min_idle_ms=5000, # XAUTOCLAIM threshold +) + +# Producer +stream_id = stream.produce( + "order.placed", + {"order_id": "o-1234", "customer": "alice", "amount": "49.50"}, +) + +# Consumer group + one consumer +stream.ensure_group("notifications", start_id="0-0") +entries = stream.consume("notifications", "worker-a", count=10, block_ms=500) +for entry_id, fields in entries: + handle(fields) # your processing + stream.ack("notifications", [entry_id]) # XACK + +# Recover entries from a crashed consumer (idle ≥ claim_min_idle_ms) +stream.autoclaim("notifications", "worker-b", count=100) + +# Replay history (independent of any group's cursor) +for entry_id, fields in stream.replay("-", "+", count=50): + print(entry_id, fields) +``` + +### Data model + +Each event is a single stream entry — a flat dict of field/value strings — with an auto-generated time-ordered ID: + +```text +demo:events:orders + 1716998413541-0 type=order.placed order_id=o-1234 customer=alice amount=49.50 ts_ms=... + 1716998413542-0 type=order.paid order_id=o-1234 customer=alice amount=49.50 ts_ms=... + 1716998413542-1 type=order.shipped order_id=o-1235 customer=bob amount=12.00 ts_ms=... + ... +``` + +The ID is `{milliseconds}-{sequence}`, so IDs are globally ordered and you can range-query by approximate wall-clock time without any extra index. The implementation uses: + +* [`XADD ... MAXLEN ~ n`]({{< relref "/commands/xadd" >}}), pipelined, for batch production with a retention cap +* [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}) with the special ID `>` for fresh deliveries to a consumer +* [`XACK`]({{< relref "/commands/xack" >}}) on every processed entry +* [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}) for sweeping idle pending entries to a healthy consumer +* [`XRANGE`]({{< relref "/commands/xrange" >}}) for replay and audit +* [`XPENDING`]({{< relref "/commands/xpending" >}}) for inspecting the per-group pending list +* [`XINFO STREAM`]({{< relref "/commands/xinfo-stream" >}}), + [`XINFO GROUPS`]({{< relref "/commands/xinfo-groups" >}}), and + [`XINFO CONSUMERS`]({{< relref "/commands/xinfo-consumers" >}}) for surface-level observability +* [`XTRIM`]({{< relref "/commands/xtrim" >}}) for explicit retention enforcement + +## Producing events + +`produce_batch` pipelines `XADD` calls in a single round trip. Each call carries an approximate `MAXLEN ~` cap so the stream stays bounded as it rolls forward: + +```python +def produce_batch(self, events: Iterable[tuple[str, dict]]) -> list[str]: + pipe = self.redis.pipeline(transaction=False) + for event_type, payload in events: + fields = self._encode_fields(event_type, payload) + pipe.xadd( + self.stream_key, + fields, + maxlen=self.maxlen_approx, + approximate=True, + ) + ids = pipe.execute() + ... + return list(ids) +``` + +The `~` flavour of `MAXLEN` lets Redis trim at a macro-node boundary, which is much cheaper than exact trimming and is what you want when the cap is a retention *guardrail*, not a hard size constraint. With 300 events produced and `MAXLEN ~ 50`, you might end up with 100 entries left — Redis released the oldest whole macro-node and stopped. The next `XADD` will keep length stable. + +If you genuinely need an exact cap (rare), drop `approximate=True`. The performance difference is significant on busy streams. + +## Reading with a consumer group + +Each consumer in a group runs the same `XREADGROUP` loop. The special ID `>` means "deliver entries this group has not yet delivered to *anyone*": + +```python +def consume( + self, + group: str, + consumer: str, + count: int = 10, + block_ms: int = 500, +) -> list[Entry]: + result = self.redis.xreadgroup( + group, + consumer, + {self.stream_key: ">"}, + count=count, + block=block_ms, + ) + return _flatten_entries(result) +``` + +`block_ms` makes the call efficient even when the stream is idle: the client parks on the server until either an entry arrives or the timeout expires, so consumers don't busy-loop. + +Reading with an explicit ID like `0-0` instead of `>` does something different — it replays entries already delivered to *this* consumer name (its private PEL). That is the canonical recovery path when the same consumer restarts: catch up on its own pending entries first, then resume reading new ones. + +## Acknowledging entries + +Once the consumer has processed an entry, `XACK` tells Redis it can drop the entry from the group's pending list: + +```python +def ack(self, group: str, ids: Iterable[str]) -> int: + ids = list(ids) + if not ids: + return 0 + return int(self.redis.xack(self.stream_key, group, *ids)) +``` + +This is the linchpin of at-least-once delivery: an entry that is never acked stays in the PEL forever (until a claim moves it elsewhere). If your consumer thread crashes between processing and ack, the entry is *retained*, not lost — the next claim sweep picks it up. + +The trade-off is the opposite of pub/sub: a slow or crashed consumer doesn't lose messages, but it does mean your downstream system must be idempotent. If you process an order twice because the first attempt died after the side effect but before the ack, the second attempt must be safe. + +## Multiple consumer groups, one stream + +The big difference between Redis Streams and a job queue is that any number of independent consumer groups can read the same stream. The demo sets up two groups on `demo:events:orders`: + +```python +stream.ensure_group("notifications", start_id="0-0") +stream.ensure_group("analytics", start_id="0-0") +``` + +Each group has its own cursor. Producing 5 events results in `notifications` and `analytics` each receiving all 5, with no coordination between them. Within `notifications`, the work is split across `worker-a` and `worker-b`: Redis hands each `XREADGROUP` call whatever entries are not yet delivered to anyone in the group, so adding a second worker doubles throughput without any rebalance logic. + +The `start_id="0-0"` argument means "deliver everything in the stream from the beginning" — useful in a demo and for fresh groups bootstrapping from history. In production, a brand-new group reading a long-existing stream usually starts at `$` ("only events after this point") and uses [`XRANGE`]({{< relref "/commands/xrange" >}}) explicitly if it needs history. + +## Recovering crashed consumers with XAUTOCLAIM + +The demo's "Crash next 3" button tells a chosen consumer to drop its next three deliveries on the floor without acking them — the same effect as a worker process dying mid-message. Those entries stay in the group's PEL with their delivery counter incremented. Once they have been idle for at least `claim_min_idle_ms`, the recovery sweep picks them up: + +```python +def autoclaim( + self, + group: str, + consumer: str, + count: int = 100, + start_id: str = "0-0", +) -> list[Entry]: + _next_id, claimed, _deleted = self.redis.xautoclaim( + self.stream_key, + group, + consumer, + min_idle_time=self.claim_min_idle_ms, + start_id=start_id, + count=count, + ) + return list(claimed) +``` + +`XAUTOCLAIM` walks the group's PEL, finds every entry idle longer than `min_idle_time`, reassigns it to the named consumer, and returns the reassigned entries. The delivery counter is incremented on every claim — after a few cycles you can use it to detect a *poison-pill* message that crashes every consumer that touches it, and route it to a dead-letter stream instead of looping forever. + +In production this loop runs periodically (every few seconds) on every healthy consumer, or on a dedicated reaper. The demo exposes it as a button so you can trigger it manually after waiting for the idle threshold. + +`XCLAIM` (singular, no auto) does the same thing for a specific list of entry IDs you already have in hand — useful when you want to take ownership of one known stuck entry rather than sweep the whole PEL. + +## Replay with XRANGE + +`XRANGE` reads a slice of history. It is completely independent of any consumer group — no cursors move, no acks happen — so it is safe to call any number of times, from any process: + +```python +def replay( + self, + start_id: str = "-", + end_id: str = "+", + count: int = 100, +) -> list[Entry]: + return list(self.redis.xrange( + self.stream_key, min=start_id, max=end_id, count=count, + )) +``` + +The special IDs `-` and `+` mean "from the very beginning" and "to the very end". You can also pass real IDs (`1716998413541-0`) or just the millisecond part (`1716998413541`, which Redis interprets as "any entry with this timestamp"). + +Typical uses: + +* **Bootstrapping a new projection** — read the entire stream from `-` and build a derived view in another store (a search index, a SQL table, a different cache). Doing this against a consumer group would consume the entries; `XRANGE` lets you do it without disrupting live consumers. +* **Auditing recent activity** — read the last few minutes by ID range without touching any group cursor. +* **Debugging** — fetch one specific entry by its ID, or a tight range around an incident timestamp, to see exactly what producers wrote. + +## The consumer worker thread + +`ConsumerWorker` wraps the `XREADGROUP` → process → `XACK` loop in a daemon thread +([source](https://github.com/redis/docs/blob/main/content/develop/use-cases/streaming/redis-py/consumer_worker.py)): + +```python +def _run(self) -> None: + while not self._stop_event.is_set(): + if self._paused.is_set(): + time.sleep(0.05) + continue + try: + entries = self.stream.consume( + self.group, self.name, count=10, block_ms=500, + ) + except Exception as exc: + print(f"[{self.group}/{self.name}] read failed: {exc}") + time.sleep(0.5) + continue + + for entry_id, fields in entries: + if self.process_latency_ms: + time.sleep(self.process_latency_ms / 1000.0) + self._handle_entry(entry_id, fields) +``` + +`_handle_entry` either acks (the normal path) or, when the demo has asked the worker to "crash", drops the entry on the floor and increments a counter so the UI can show what is currently in the PEL waiting to be claimed. + +The pause and crash levers exist only for the demo. A real consumer is just the read-process-ack loop — everything else in this class is instrumentation. + +## Prerequisites + +* Redis 6.2 or later (Redis 7.0+ recommended for `XAUTOCLAIM`). +* Python 3.9 or later. +* The `redis-py` client. Install it with: + + ```bash + pip install "redis>=5.0" + ``` + +If your Redis server is running elsewhere, start the demo with `--redis-host` and `--redis-port`. + +## Running the demo + +### Get the source files + +The demo consists of three Python files. Download them from the [`redis-py` source folder](https://github.com/redis/docs/tree/main/content/develop/use-cases/streaming/redis-py) on GitHub, or grab them with `curl`: + +```bash +mkdir streaming-demo && cd streaming-demo +BASE=https://raw.githubusercontent.com/redis/docs/main/content/develop/use-cases/streaming/redis-py +curl -O $BASE/event_stream.py +curl -O $BASE/consumer_worker.py +curl -O $BASE/demo_server.py +``` + +### Start the demo server + +From that directory: + +```bash +python3 demo_server.py +``` + +You should see: + +```text +Redis streaming demo server listening on http://127.0.0.1:8083 +Using Redis at localhost:6379 with stream key 'demo:events:orders' (MAXLEN ~ 2000) +Seeded 3 consumer(s) across 2 group(s) +``` + +Open [http://127.0.0.1:8083](http://127.0.0.1:8083) in a browser. You can: + +* **Produce** any number of events of a chosen type (or random types). Watch the stream length grow and the tail update. +* See each **consumer group**: its `last-delivered-id`, the size of its pending list, and the consumers in it. Each consumer shows its processed count, pending count, and idle time. +* **Add or remove** consumers within a group at runtime to see Redis split the work across the new shape. +* Click **Crash next 3** on a consumer to drop its next three deliveries — the same effect as a worker process dying after `XREADGROUP` but before `XACK`. Watch the **Pending entries (XPENDING)** panel fill up. +* Wait until the idle time exceeds the threshold (default 5000 ms), pick a healthy target consumer, and click **XAUTOCLAIM to selected** — the stuck entries are reassigned and the delivery counter increments. +* **Replay (XRANGE)** any range to confirm the full history is independent of consumer-group state. +* **XTRIM** with an approximate `MAXLEN` to bound retention. Note that an approximate trim only releases whole macro-nodes — `MAXLEN ~ 50` on a small stream may not delete anything; on a 300-entry stream it typically lands at around 100. +* Click **Reset demo** to drop the stream and re-seed the default groups. + +## Production usage + +### Pick retention by length or by minimum ID + +The demo uses `MAXLEN ~` on every `XADD`. Two alternatives are worth considering: + +* `MINID ~ ` — keep only entries newer than an ID. If you want "the last 24 hours", compute the wall-clock cutoff and pass `XTRIM MINID ~ -0`. This is the right pattern when retention is time-bounded. +* No cap on `XADD` plus a periodic `XTRIM` job — useful if your producer is hot and the per-`XADD` work has to stay minimal, or if retention rules are complex (a separate process can also factor in consumer-group lag). + +In all three cases the trimming is approximate by default. Use exact trimming (`MAXLEN n` or `MINID id` without `~`) only when you genuinely need an exact count. + +### Don't let consumer-group lag silently grow + +`XINFO GROUPS` reports each group's `lag` (entries the group has not yet read) and `pending` (entries delivered but not acked). In production, alert on either of these crossing a threshold — a steadily growing pending count usually means consumers are crashing without `XAUTOCLAIM` running, and a growing lag means consumers can't keep up with producers. + +The same applies inside a group: `XINFO CONSUMERS` reports per-consumer pending counts and idle times, so you can spot one slow consumer holding entries that the rest of the group is waiting on. + +### Make consumer logic idempotent + +`XAUTOCLAIM` can re-deliver an entry to a different consumer after a crash. If your processing has side effects (sending email, charging a card, updating a downstream store), make sure the same entry processed twice gives the same result — use an idempotency key, an upsert with conditional check, or a once-per-id guard table. Redis Streams cannot give you exactly-once semantics on its own. + +### Bound the delivery counter as a poison-pill signal + +`XPENDING` returns each entry's delivery count, incremented on every claim. If an entry has been delivered (and dropped) several times, the next consumer is unlikely to fare better. After some threshold — `deliveries >= 5`, say — route the entry to a *dead-letter stream*, ack it on the original group, and alert. Without this, one bad entry can stop the group's forward progress indefinitely. + +### Partition by tenant or entity for scale + +A single Redis Stream is a single key, and on a Redis Cluster a single key lives on a single shard. If your throughput exceeds what one shard can handle, partition the stream — for example by tenant ID (`events:orders:{tenant_a}`, `events:orders:{tenant_b}`) — so different tenants land on different shards. Hash-tags (`{tenant_a}`) keep all related streams on the same shard if you need to multi-stream atomically. + +Per-entity partitioning (`events:order:{order_id}`) is the canonical pattern when you treat each entity's stream as the event-sourcing log for that entity: every state change for one order goes on its own stream, which is also bounded in size by the entity's lifetime. + +### Use a separate consumer pool per group + +The demo runs every consumer in one process. In production each consumer group is usually its own deployment — its own pool of pods or VMs — so a slow projection in `analytics` cannot pull `notifications` workers off their stream. Each pod runs one consumer thread per CPU core, with `XAUTOCLAIM` either embedded in the consumer loop (every N reads, claim idle entries to self) or run by a separate reaper. + +### Don't read with XREAD (no group) and then try to ack + +`XREAD` and `XREADGROUP` are different mechanisms. `XREAD` is a tail-the-log read with no consumer-group state — entries are not added to any PEL, and you cannot `XACK` them. If you want at-least-once delivery and crash recovery, you must read through a consumer group. + +`XREAD` is still useful for read-only tail clients (a UI streaming events, a debugger, a `tail -f`-style command-line tool). It's just not part of the at-least-once path. + +### Inspect the stream directly with redis-cli + +When testing or troubleshooting, inspect the stream directly to confirm the consumer state is what you expect: + +```bash +# Stream summary +redis-cli XLEN demo:events:orders +redis-cli XINFO STREAM demo:events:orders + +# Group cursors and pending counts +redis-cli XINFO GROUPS demo:events:orders + +# Consumers within a group +redis-cli XINFO CONSUMERS demo:events:orders notifications + +# Pending entries with idle time and delivery count +redis-cli XPENDING demo:events:orders notifications - + 20 + +# Tail the stream live (no consumer-group state — like tail -f) +redis-cli XREAD BLOCK 0 STREAMS demo:events:orders '$' + +# Replay a range +redis-cli XRANGE demo:events:orders - + COUNT 50 +``` + +If a group's `lag` is growing while consumers' `idle` times are short, consumers are healthy but producers are outpacing them — add more consumers. If `pending` is growing while `lag` is small, consumers are *receiving* entries but not *acking* them — either they are crashing mid-message or your acking logic has a bug. + +## Learn more + +This example uses the following Redis commands: + +* [`XADD`]({{< relref "/commands/xadd" >}}) to append an event with an approximate `MAXLEN` cap. +* [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}) to read new entries for a consumer in a group. +* [`XACK`]({{< relref "/commands/xack" >}}) to acknowledge a processed entry. +* [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}) to reassign idle pending entries to a healthy consumer. +* [`XRANGE`]({{< relref "/commands/xrange" >}}) for replay and audit, independent of consumer-group state. +* [`XPENDING`]({{< relref "/commands/xpending" >}}) to inspect the per-group pending list with idle times and delivery counts. +* [`XTRIM`]({{< relref "/commands/xtrim" >}}) for explicit retention enforcement. +* [`XGROUP CREATE`]({{< relref "/commands/xgroup-create" >}}) and + [`XGROUP DELCONSUMER`]({{< relref "/commands/xgroup-delconsumer" >}}) to manage groups and consumers. +* [`XINFO STREAM`]({{< relref "/commands/xinfo-stream" >}}), + [`XINFO GROUPS`]({{< relref "/commands/xinfo-groups" >}}), and + [`XINFO CONSUMERS`]({{< relref "/commands/xinfo-consumers" >}}) for observability. + +See the [`redis-py` documentation]({{< relref "/develop/clients/redis-py" >}}) for the full client reference, and the [Streams overview]({{< relref "/develop/data-types/streams" >}}) for the deeper conceptual model — consumer groups, the PEL, claim semantics, capped streams, and the differences with Kafka partitions. diff --git a/content/develop/use-cases/streaming/redis-py/consumer_worker.py b/content/develop/use-cases/streaming/redis-py/consumer_worker.py new file mode 100644 index 0000000000..a8963ab06d --- /dev/null +++ b/content/develop/use-cases/streaming/redis-py/consumer_worker.py @@ -0,0 +1,167 @@ +""" +Background consumer thread for a single consumer in a consumer group. + +Each worker owns a daemon thread that loops on ``XREADGROUP`` with a +short block timeout and acks every entry it processes. Two demo-only +levers are wired into the loop: + +* ``pause()`` parks the worker (so its pending entries age into the + ``XAUTOCLAIM`` window). +* ``crash_next(n)`` tells the worker to drop its next ``n`` deliveries + on the floor without acking them — the same effect as a worker + process dying mid-message. Those entries stay in the group's PEL + until claimed. + +Real consumers do not need either lever; they only need the +``XREADGROUP`` -> process -> ``XACK`` loop in ``_run``. +""" + +from __future__ import annotations + +import threading +import time +from collections import deque +from typing import Optional + +from event_stream import RedisEventStream + + +class ConsumerWorker: + """One consumer in a consumer group, running in its own thread.""" + + def __init__( + self, + stream: RedisEventStream, + group: str, + name: str, + process_latency_ms: int = 25, + recent_capacity: int = 20, + ) -> None: + self.stream = stream + self.group = group + self.name = name + self.process_latency_ms = process_latency_ms + + self._recent: deque[dict] = deque(maxlen=recent_capacity) + self._lock = threading.Lock() + self._processed = 0 + self._crashed_drops = 0 + + self._paused = threading.Event() + self._crash_next = 0 + self._stop_event = threading.Event() + self._thread: Optional[threading.Thread] = None + + # ------------------------------------------------------------------ + # Lifecycle + # ------------------------------------------------------------------ + + def start(self) -> None: + if self._thread and self._thread.is_alive(): + return + self._stop_event.clear() + self._thread = threading.Thread( + target=self._run, + name=f"consumer-{self.group}-{self.name}", + daemon=True, + ) + self._thread.start() + + def stop(self, timeout: float = 1.0) -> None: + self._stop_event.set() + if self._thread: + self._thread.join(timeout=timeout) + + # ------------------------------------------------------------------ + # Demo levers + # ------------------------------------------------------------------ + + def pause(self) -> None: + self._paused.set() + + def resume(self) -> None: + self._paused.clear() + + def crash_next(self, count: int) -> None: + """Drop the next ``count`` deliveries without acking them. + + The entries stay in the group's PEL with their delivery + counter incremented, so ``XAUTOCLAIM`` can recover them once + they exceed the idle threshold. + """ + with self._lock: + self._crash_next += max(0, int(count)) + + # ------------------------------------------------------------------ + # Introspection + # ------------------------------------------------------------------ + + def recent(self) -> list[dict]: + with self._lock: + return list(self._recent) + + def status(self) -> dict: + with self._lock: + return { + "name": self.name, + "group": self.group, + "processed": self._processed, + "crashed_drops": self._crashed_drops, + "paused": self._paused.is_set(), + "crash_queued": self._crash_next, + "alive": bool(self._thread and self._thread.is_alive()), + } + + # ------------------------------------------------------------------ + # Main loop + # ------------------------------------------------------------------ + + def _run(self) -> None: + while not self._stop_event.is_set(): + if self._paused.is_set(): + time.sleep(0.05) + continue + try: + entries = self.stream.consume( + self.group, self.name, count=10, block_ms=500, + ) + except Exception as exc: + # Don't kill the thread on a transient Redis error; a + # real consumer would log this and back off. + print(f"[{self.group}/{self.name}] read failed: {exc}") + time.sleep(0.5) + continue + + for entry_id, fields in entries: + if self.process_latency_ms: + time.sleep(self.process_latency_ms / 1000.0) + self._handle_entry(entry_id, fields) + + def _handle_entry(self, entry_id: str, fields: dict[str, str]) -> None: + with self._lock: + drop = self._crash_next > 0 + if drop: + self._crash_next -= 1 + + if drop: + with self._lock: + self._crashed_drops += 1 + self._recent.appendleft({ + "id": entry_id, + "type": fields.get("type", ""), + "fields": fields, + "acked": False, + "note": "dropped (simulated crash)", + }) + return + + self.stream.ack(self.group, [entry_id]) + with self._lock: + self._processed += 1 + self._recent.appendleft({ + "id": entry_id, + "type": fields.get("type", ""), + "fields": fields, + "acked": True, + "note": "", + }) diff --git a/content/develop/use-cases/streaming/redis-py/demo_server.py b/content/develop/use-cases/streaming/redis-py/demo_server.py new file mode 100644 index 0000000000..3a4b276610 --- /dev/null +++ b/content/develop/use-cases/streaming/redis-py/demo_server.py @@ -0,0 +1,826 @@ +#!/usr/bin/env python3 +""" +Redis streaming demo server. + +Run this file and visit http://localhost:8083 to watch a Redis Stream +in action: producers append events to a single stream, two independent +consumer groups read the same stream at their own pace, and within +the ``notifications`` group two consumers share the work. + +Use the UI to: + +* Produce events into the stream. +* Watch each consumer group's last-delivered ID, PEL count, and the + consumers inside it. +* Drop the next ``N`` messages from a chosen consumer to simulate a + crash mid-processing, then run ``XAUTOCLAIM`` to reassign the + stuck entries to a healthy consumer. +* Replay any ID range with ``XRANGE`` to confirm the history is + independent of consumer-group state. +* Trim the stream with ``XTRIM`` to bound retention. +""" + +from __future__ import annotations + +import argparse +import json +import random +import sys +import time +from http.server import BaseHTTPRequestHandler, ThreadingHTTPServer +from pathlib import Path +from urllib.parse import parse_qs, urlparse + +sys.path.insert(0, str(Path(__file__).resolve().parent)) + +try: + import redis + + from consumer_worker import ConsumerWorker + from event_stream import RedisEventStream +except ImportError as exc: + print(f"Error: {exc}") + print("Make sure the 'redis' package is installed: pip install redis") + sys.exit(1) + + +EVENT_TYPES = ["order.placed", "order.paid", "order.shipped", "order.cancelled"] +DEFAULT_GROUPS: dict[str, list[str]] = { + "notifications": ["worker-a", "worker-b"], + "analytics": ["worker-c"], +} + + +HTML_TEMPLATE = """ + + + + + Redis Streaming Demo + + + +
+
redis-py + Python standard library HTTP server
+

Redis Streaming Demo

+

+ Producers append events to a single Redis Stream + (__STREAM_KEY__). Two consumer groups read the same + stream independently: notifications shares its work + across two consumers, analytics processes the full + flow on its own. Acknowledge with XACK, recover + crashed deliveries with XAUTOCLAIM, replay any range + with XRANGE, and bound retention with XTRIM. +

+ +
+
+

Stream state

+
Loading...
+ + +
+ +
+

Produce events

+

Events are appended with XADD with an approximate + MAXLEN ~ __MAXLEN__ retention cap.

+ + + + + +
+ +
+

Replay range (XRANGE)

+

Reads a slice of history. Replay is independent of any + consumer group — no cursors move, no acks happen.

+ + + + + + + +
+ +
+

Trim retention (XTRIM)

+

Cap the stream length. Approximate trimming releases whole + macro-nodes, which is much cheaper than exact trimming.

+ + + +
+ +
+

Consumer groups

+
Loading...
+
+ +
+

Pending entries (XPENDING)

+

Entries delivered to a consumer that haven't been acked yet. + Idle time ≥ __CLAIM_IDLE__ ms is eligible for + XAUTOCLAIM.

+
Loading...
+
+ + +
+
+ +
+

Last result

+

Produce events, replay a range, or trigger an autoclaim to see results.

+
+
+ +
+
+ + + + +""" + + +class StreamingDemo: + """In-memory registry of consumer workers across all groups.""" + + def __init__(self, stream: RedisEventStream) -> None: + self.stream = stream + self.workers: dict[tuple[str, str], ConsumerWorker] = {} + + def seed(self, groups: dict[str, list[str]]) -> int: + for group, names in groups.items(): + self.stream.ensure_group(group, start_id="0-0") + for name in names: + self.add_worker(group, name) + return sum(len(v) for v in groups.values()) + + def add_worker(self, group: str, name: str) -> bool: + key = (group, name) + if key in self.workers: + return False + self.stream.ensure_group(group, start_id="0-0") + worker = ConsumerWorker(self.stream, group=group, name=name) + worker.start() + self.workers[key] = worker + return True + + def remove_worker(self, group: str, name: str) -> bool: + key = (group, name) + worker = self.workers.pop(key, None) + if worker is None: + return False + worker.stop() + self.stream.delete_consumer(group, name) + return True + + def get_worker(self, group: str, name: str) -> ConsumerWorker | None: + return self.workers.get((group, name)) + + def stop_all(self) -> None: + for worker in list(self.workers.values()): + worker.stop() + self.workers.clear() + + def reset(self) -> int: + self.stop_all() + self.stream.delete_stream() + self.stream.reset_stats() + return self.seed(DEFAULT_GROUPS) + + +class StreamingDemoHandler(BaseHTTPRequestHandler): + """HTTP handler. Server-state is hung off class attributes.""" + + stream: RedisEventStream | None = None + demo: StreamingDemo | None = None + + def do_GET(self) -> None: + parsed = urlparse(self.path) + if parsed.path in {"/", "/index.html"}: + self._send_html(self._html_page()) + return + if parsed.path == "/state": + self._send_json(self._build_state(), 200) + return + if parsed.path == "/replay": + self._handle_replay(parse_qs(parsed.query)) + return + self.send_error(404) + + def do_POST(self) -> None: + parsed = urlparse(self.path) + if parsed.path == "/produce": + self._handle_produce() + return + if parsed.path == "/add-worker": + self._handle_add_worker() + return + if parsed.path == "/remove-worker": + self._handle_remove_worker() + return + if parsed.path == "/crash": + self._handle_crash() + return + if parsed.path == "/autoclaim": + self._handle_autoclaim() + return + if parsed.path == "/trim": + self._handle_trim() + return + if parsed.path == "/reset": + count = self.demo.reset() + self._send_json({"groups": count}, 200) + return + self.send_error(404) + + # ---- POST handlers ---------------------------------------------- + + def _handle_produce(self) -> None: + params = self._read_form_data() + count = max(1, min(500, int(params.get("count", ["1"])[0] or "1"))) + event_type = (params.get("type", [""])[0] or "").strip() + events = [] + for _ in range(count): + picked = event_type or random.choice(EVENT_TYPES) + events.append((picked, _fake_payload())) + ids = self.stream.produce_batch(events) + self._send_json({"produced": len(ids), "ids": ids}, 200) + + def _handle_add_worker(self) -> None: + params = self._read_form_data() + group = params.get("group", [""])[0].strip() + name = params.get("name", [""])[0].strip() + if not group or not name: + self._send_json({"error": "group and name are required"}, 400) + return + added = self.demo.add_worker(group, name) + if not added: + self._send_json({"error": f"{group}/{name} already exists"}, 409) + return + self._send_json({"group": group, "name": name}, 200) + + def _handle_remove_worker(self) -> None: + params = self._read_form_data() + group = params.get("group", [""])[0].strip() + name = params.get("name", [""])[0].strip() + removed = self.demo.remove_worker(group, name) + self._send_json({"removed": removed}, 200) + + def _handle_crash(self) -> None: + params = self._read_form_data() + group = params.get("group", [""])[0].strip() + name = params.get("name", [""])[0].strip() + count = int(params.get("count", ["1"])[0] or "1") + worker = self.demo.get_worker(group, name) + if worker is None: + self._send_json({"error": f"unknown consumer {group}/{name}"}, 404) + return + worker.crash_next(count) + self._send_json({"queued": count}, 200) + + def _handle_autoclaim(self) -> None: + params = self._read_form_data() + group = params.get("group", [""])[0].strip() + consumer = params.get("consumer", [""])[0].strip() + if not group or not consumer: + self._send_json({"error": "group and consumer are required"}, 400) + return + claimed = self.stream.autoclaim(group, consumer, count=100) + entries = [ + {"id": entry_id, "fields": fields} for entry_id, fields in claimed + ] + self._send_json( + { + "claimed": len(claimed), + "entries": entries, + "min_idle_ms": self.stream.claim_min_idle_ms, + }, + 200, + ) + + def _handle_trim(self) -> None: + params = self._read_form_data() + maxlen = int(params.get("maxlen", ["0"])[0] or "0") + deleted = self.stream.trim_maxlen(maxlen) + self._send_json({"deleted": deleted, "maxlen": maxlen}, 200) + + def _handle_replay(self, query: dict[str, list[str]]) -> None: + start = query.get("start", ["-"])[0] or "-" + end = query.get("end", ["+"])[0] or "+" + limit = max(1, min(500, int(query.get("count", ["20"])[0] or "20"))) + entries = self.stream.replay(start, end, count=limit) + self._send_json( + { + "start": start, + "end": end, + "limit": limit, + "entries": [ + {"id": entry_id, "fields": fields} + for entry_id, fields in entries + ], + }, + 200, + ) + + # ---- State assembly --------------------------------------------- + + def _build_state(self) -> dict: + stream_info = self.stream.info_stream() + groups = self.stream.info_groups() + + groups_detail = [] + pending_rows: list[dict] = [] + for group in groups: + name = group["name"] + consumer_info = {c["name"]: c for c in self.stream.info_consumers(name)} + consumers_detail = [] + for (g_name, c_name), worker in self.demo.workers.items(): + if g_name != name: + continue + info = consumer_info.get(c_name, {}) + status = worker.status() + consumers_detail.append({ + **status, + "pending": info.get("pending", 0), + "idle_ms": info.get("idle_ms", 0), + "recent": worker.recent(), + }) + # Also include consumers that exist in Redis but not in + # our in-process registry (e.g. orphaned after a restart). + for c_name, info in consumer_info.items(): + if not any(c["name"] == c_name for c in consumers_detail): + consumers_detail.append({ + "name": c_name, + "group": name, + "processed": 0, + "crashed_drops": 0, + "paused": False, + "crash_queued": 0, + "alive": False, + "pending": info.get("pending", 0), + "idle_ms": info.get("idle_ms", 0), + "recent": [], + }) + consumers_detail.sort(key=lambda c: c["name"]) + groups_detail.append({**group, "consumers_detail": consumers_detail}) + + for row in self.stream.pending_detail(name, count=50): + pending_rows.append({**row, "group": name}) + + tail_entries = self.stream.replay("-", "+", count=10) + tail_entries = list(reversed(tail_entries)) # newest first + tail = [ + {"id": entry_id, "fields": fields} for entry_id, fields in tail_entries + ] + + return { + "stream": stream_info, + "tail": tail, + "groups": groups_detail, + "pending": pending_rows, + "stats": self.stream.stats(), + } + + # ---- HTTP plumbing ---------------------------------------------- + + def _read_form_data(self) -> dict[str, list[str]]: + content_length = int(self.headers.get("Content-Length", "0")) + raw_body = self.rfile.read(content_length).decode("utf-8") + return parse_qs(raw_body) + + def _send_html(self, html: str, status: int = 200) -> None: + self.send_response(status) + self.send_header("Content-Type", "text/html; charset=utf-8") + self.end_headers() + self.wfile.write(html.encode("utf-8")) + + def _send_json(self, payload: dict, status: int) -> None: + self.send_response(status) + self.send_header("Content-Type", "application/json") + self.end_headers() + self.wfile.write(json.dumps(payload).encode("utf-8")) + + def log_message(self, format: str, *args) -> None: # noqa: A002 + sys.stderr.write(f"[demo] {format % args}\n") + + def _html_page(self) -> str: + return ( + HTML_TEMPLATE + .replace("__STREAM_KEY__", self.stream.stream_key) + .replace("__MAXLEN__", str(self.stream.maxlen_approx)) + .replace("__CLAIM_IDLE__", str(self.stream.claim_min_idle_ms)) + ) + + +def _fake_payload() -> dict: + return { + "order_id": f"o-{random.randint(1000, 9999)}", + "customer": random.choice(["alice", "bob", "carol", "dan", "erin"]), + "amount": f"{random.uniform(5, 250):.2f}", + } + + +def parse_args() -> argparse.Namespace: + parser = argparse.ArgumentParser(description="Run the Redis streaming demo server.") + parser.add_argument("--host", default="127.0.0.1", help="HTTP bind host") + parser.add_argument("--port", type=int, default=8083, help="HTTP bind port") + parser.add_argument("--redis-host", default="localhost", help="Redis host") + parser.add_argument("--redis-port", type=int, default=6379, help="Redis port") + parser.add_argument( + "--stream-key", + default="demo:events:orders", + help="Redis Stream key", + ) + parser.add_argument( + "--maxlen", + type=int, + default=2000, + help="Approximate MAXLEN cap on every XADD", + ) + parser.add_argument( + "--claim-idle-ms", + type=int, + default=5000, + help="Minimum idle time before XAUTOCLAIM may reassign a pending entry", + ) + return parser.parse_args() + + +def main() -> None: + args = parse_args() + + redis_client = redis.Redis( + host=args.redis_host, + port=args.redis_port, + decode_responses=True, + ) + stream = RedisEventStream( + redis_client=redis_client, + stream_key=args.stream_key, + maxlen_approx=args.maxlen, + claim_min_idle_ms=args.claim_idle_ms, + ) + demo = StreamingDemo(stream) + stream.delete_stream() + seeded = demo.seed(DEFAULT_GROUPS) + + StreamingDemoHandler.stream = stream + StreamingDemoHandler.demo = demo + + print(f"Redis streaming demo server listening on http://{args.host}:{args.port}") + print( + f"Using Redis at {args.redis_host}:{args.redis_port}" + f" with stream key '{args.stream_key}' (MAXLEN ~ {args.maxlen})" + ) + print(f"Seeded {seeded} consumer(s) across {len(DEFAULT_GROUPS)} group(s)") + + server = ThreadingHTTPServer((args.host, args.port), StreamingDemoHandler) + try: + server.serve_forever() + except KeyboardInterrupt: + pass + finally: + demo.stop_all() + + +if __name__ == "__main__": + main() diff --git a/content/develop/use-cases/streaming/redis-py/event_stream.py b/content/develop/use-cases/streaming/redis-py/event_stream.py new file mode 100644 index 0000000000..adb9b0bea9 --- /dev/null +++ b/content/develop/use-cases/streaming/redis-py/event_stream.py @@ -0,0 +1,321 @@ +""" +Redis event-stream helper backed by a single Redis Stream. + +Producers append events with ``XADD``. Consumers belong to consumer +groups and read with ``XREADGROUP``, which gives each consumer a +private cursor and a pending-entries list (PEL) of in-flight messages. +Once a consumer has processed an entry it acknowledges it with +``XACK``; entries left unacknowledged past an idle threshold can be +swept to a healthy consumer with ``XAUTOCLAIM`` (or to a specific one +with ``XCLAIM``). + +Each ``XADD`` carries an approximate ``MAXLEN`` so the stream stays +bounded as it rolls forward. ``XRANGE`` supports replay from any point +in history for debugging, audit, or rebuilding a downstream projection. + +The same stream can be read by any number of consumer groups — each +group has its own cursor and its own pending list, so analytics, +notifications, and audit can all process the full event flow at their +own pace without coordinating with each other. +""" + +from __future__ import annotations + +import time +from threading import Lock +from typing import Iterable, Optional + +import redis + + +Entry = tuple[str, dict[str, str]] + + +class RedisEventStream: + """Producer/consumer helper for a single Redis Stream with consumer groups.""" + + def __init__( + self, + redis_client: Optional[redis.Redis] = None, + stream_key: str = "demo:events:orders", + maxlen_approx: int = 10_000, + claim_min_idle_ms: int = 15_000, + ) -> None: + self.redis = redis_client or redis.Redis( + host="localhost", + port=6379, + decode_responses=True, + ) + self.stream_key = stream_key + self.maxlen_approx = maxlen_approx + self.claim_min_idle_ms = claim_min_idle_ms + + self._stats_lock = Lock() + self._produced_total = 0 + self._acked_total = 0 + self._claimed_total = 0 + + # ------------------------------------------------------------------ + # Producer + # ------------------------------------------------------------------ + + def produce(self, event_type: str, payload: dict) -> str: + """Append a single event. Returns the stream ID Redis assigned.""" + return self.produce_batch([(event_type, payload)])[0] + + def produce_batch(self, events: Iterable[tuple[str, dict]]) -> list[str]: + """Pipeline several ``XADD`` calls in one round trip. + + Each entry carries an approximate ``MAXLEN`` cap. The ``~`` + flavour lets Redis trim at a macro-node boundary, which is + much cheaper than exact trimming and is the right call for a + retention guardrail rather than a hard size limit. + """ + pipe = self.redis.pipeline(transaction=False) + for event_type, payload in events: + fields = self._encode_fields(event_type, payload) + pipe.xadd( + self.stream_key, + fields, + maxlen=self.maxlen_approx, + approximate=True, + ) + ids = pipe.execute() + with self._stats_lock: + self._produced_total += len(ids) + return list(ids) + + @staticmethod + def _encode_fields(event_type: str, payload: dict) -> dict[str, str]: + fields: dict[str, str] = { + "type": event_type, + "ts_ms": str(int(time.time() * 1000)), + } + for key, value in payload.items(): + fields[key] = "" if value is None else str(value) + return fields + + # ------------------------------------------------------------------ + # Consumer groups + # ------------------------------------------------------------------ + + def ensure_group(self, group: str, start_id: str = "$") -> None: + """Create the consumer group if it doesn't exist. + + ``$`` means "deliver only events appended after this point"; + pass ``0-0`` to replay the entire stream into a fresh group. + """ + try: + self.redis.xgroup_create( + self.stream_key, group, id=start_id, mkstream=True, + ) + except redis.ResponseError as exc: + if "BUSYGROUP" not in str(exc): + raise + + def delete_group(self, group: str) -> int: + return int(self.redis.xgroup_destroy(self.stream_key, group)) + + def consume( + self, + group: str, + consumer: str, + count: int = 10, + block_ms: int = 500, + ) -> list[Entry]: + """Read new entries for this consumer via ``XREADGROUP``. + + The ``>`` ID means "deliver entries this consumer group has not + delivered to *anyone* yet" — that is the at-least-once path. + Replaying an explicit ID instead would re-deliver an entry that + is already in this consumer's pending list (used to recover + after a crash on the same consumer name). + """ + result = self.redis.xreadgroup( + group, + consumer, + {self.stream_key: ">"}, + count=count, + block=block_ms, + ) + return _flatten_entries(result) + + def ack(self, group: str, ids: Iterable[str]) -> int: + ids = list(ids) + if not ids: + return 0 + n = int(self.redis.xack(self.stream_key, group, *ids)) + with self._stats_lock: + self._acked_total += n + return n + + def autoclaim( + self, + group: str, + consumer: str, + count: int = 100, + start_id: str = "0-0", + ) -> list[Entry]: + """Sweep idle pending entries to ``consumer``. + + ``XAUTOCLAIM`` walks the group's PEL from ``start_id`` and + reassigns every entry that has been idle for at least + ``claim_min_idle_ms`` to the named consumer. The reassigned + entry's delivery counter is incremented so a poison-pill + message can be detected after a few claim cycles. + """ + _next_id, claimed, _deleted = self.redis.xautoclaim( + self.stream_key, + group, + consumer, + min_idle_time=self.claim_min_idle_ms, + start_id=start_id, + count=count, + ) + with self._stats_lock: + self._claimed_total += len(claimed) + return list(claimed) + + def delete_consumer(self, group: str, consumer: str) -> int: + """Remove a consumer from a group. Its pending entries are released.""" + try: + return int(self.redis.xgroup_delconsumer( + self.stream_key, group, consumer, + )) + except redis.ResponseError: + return 0 + + # ------------------------------------------------------------------ + # Replay, length, trim + # ------------------------------------------------------------------ + + def replay( + self, + start_id: str = "-", + end_id: str = "+", + count: int = 100, + ) -> list[Entry]: + """Range read with ``XRANGE`` for replay or audit. + + Read-only: ranges do not update any group cursor and do not + ack anything. Useful for bootstrapping a new projection, for + building an audit view, or for debugging what actually went + through the stream. + """ + return list(self.redis.xrange( + self.stream_key, min=start_id, max=end_id, count=count, + )) + + def length(self) -> int: + return int(self.redis.xlen(self.stream_key)) + + def trim_maxlen(self, maxlen: int) -> int: + return int(self.redis.xtrim( + self.stream_key, maxlen=maxlen, approximate=True, + )) + + def trim_minid(self, minid: str) -> int: + return int(self.redis.xtrim( + self.stream_key, minid=minid, approximate=True, + )) + + # ------------------------------------------------------------------ + # Inspection + # ------------------------------------------------------------------ + + def info_stream(self) -> dict: + """Subset of ``XINFO STREAM`` that's safe to JSON-encode.""" + try: + raw = self.redis.xinfo_stream(self.stream_key) + except redis.ResponseError: + return {"length": 0, "last_generated_id": None, + "first_entry_id": None, "last_entry_id": None} + first = raw.get("first-entry") + last = raw.get("last-entry") + return { + "length": int(raw.get("length", 0)), + "last_generated_id": raw.get("last-generated-id"), + "first_entry_id": first[0] if first else None, + "last_entry_id": last[0] if last else None, + } + + def info_groups(self) -> list[dict]: + try: + rows = self.redis.xinfo_groups(self.stream_key) + except redis.ResponseError: + return [] + return [ + { + "name": row["name"], + "consumers": int(row.get("consumers", 0)), + "pending": int(row.get("pending", 0)), + "last_delivered_id": row.get("last-delivered-id"), + "lag": int(row["lag"]) if row.get("lag") is not None else None, + } + for row in rows + ] + + def info_consumers(self, group: str) -> list[dict]: + try: + rows = self.redis.xinfo_consumers(self.stream_key, group) + except redis.ResponseError: + return [] + return [ + { + "name": row["name"], + "pending": int(row.get("pending", 0)), + "idle_ms": int(row.get("idle", 0)), + } + for row in rows + ] + + def pending_detail(self, group: str, count: int = 20) -> list[dict]: + """Per-entry PEL view (id, consumer, idle, deliveries).""" + try: + rows = self.redis.xpending_range( + self.stream_key, group, min="-", max="+", count=count, + ) + except redis.ResponseError: + return [] + return [ + { + "id": row["message_id"], + "consumer": row["consumer"], + "idle_ms": int(row["time_since_delivered"]), + "deliveries": int(row["times_delivered"]), + } + for row in rows + ] + + def stats(self) -> dict[str, int]: + with self._stats_lock: + return { + "produced_total": self._produced_total, + "acked_total": self._acked_total, + "claimed_total": self._claimed_total, + } + + def reset_stats(self) -> None: + with self._stats_lock: + self._produced_total = 0 + self._acked_total = 0 + self._claimed_total = 0 + + # ------------------------------------------------------------------ + # Demo housekeeping + # ------------------------------------------------------------------ + + def delete_stream(self) -> None: + """Drop the stream key entirely. Used by the demo's reset path.""" + self.redis.delete(self.stream_key) + + +def _flatten_entries(raw) -> list[Entry]: + """Flatten ``XREADGROUP`` output into a list of ``(id, fields)``.""" + if not raw: + return [] + out: list[Entry] = [] + for _stream, entries in raw: + for entry_id, fields in entries: + out.append((entry_id, fields)) + return out From b200e837426086325c43736589bf09cbc0baeed1 Mon Sep 17 00:00:00 2001 From: Andy Stark Date: Thu, 14 May 2026 14:30:05 +0100 Subject: [PATCH 2/4] DOC-6619 improvements suggested by review --- content/develop/use-cases/streaming/_index.md | 10 +- .../use-cases/streaming/redis-py/_index.md | 93 +++++-- .../streaming/redis-py/consumer_worker.py | 91 ++++++- .../streaming/redis-py/demo_server.py | 244 ++++++++++++++---- .../streaming/redis-py/event_stream.py | 143 ++++++++-- 5 files changed, 467 insertions(+), 114 deletions(-) diff --git a/content/develop/use-cases/streaming/_index.md b/content/develop/use-cases/streaming/_index.md index 933db116ee..5c57ebfbc7 100644 --- a/content/develop/use-cases/streaming/_index.md +++ b/content/develop/use-cases/streaming/_index.md @@ -52,7 +52,10 @@ You can: - Replay historical events for debugging, bootstrapping a new projection, or rebuilding a downstream system from scratch. - Bound memory by retaining events by length or by minimum ID, without a separate cleanup job. -- Recover unacknowledged entries from crashed consumers so no event sits invisibly in flight. +- Recover unacknowledged entries from crashed consumers, so a worker dying mid-message does not + silently lose work (entries trimmed by `MAXLEN ~` before they are acked are surfaced in + `XAUTOCLAIM`'s deleted-IDs list, so the caller can route them to a dead-letter store rather + than retry against a missing payload). - Partition streams by tenant, region, or entity for load distribution and per-entity event sourcing. - Replace a dedicated Kafka deployment for moderate-scale, short-retention streaming workloads @@ -64,8 +67,9 @@ In practice, producers append events to a stream with [`XADD`]({{< relref "/commands/xadd" >}}) and Redis assigns each entry an auto-generated, time-ordered ID. Consumers either read the stream directly with [`XREAD`]({{< relref "/commands/xread" >}}), or join a *consumer group* and read with -[`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}), which gives every consumer a private -cursor and a pending-entries list of in-flight messages. Once a consumer finishes processing an +[`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}). Each consumer gets its own +pending-entries list of in-flight messages, while the group as a whole tracks a single +`last-delivered-id` cursor that advances as entries are handed out to any consumer. Once a consumer finishes processing an entry, it acknowledges it with [`XACK`]({{< relref "/commands/xack" >}}); entries left unacknowledged past a timeout can be reassigned to a healthy consumer with [`XCLAIM`]({{< relref "/commands/xclaim" >}}) or diff --git a/content/develop/use-cases/streaming/redis-py/_index.md b/content/develop/use-cases/streaming/redis-py/_index.md index d48dbe217a..964fe8609b 100644 --- a/content/develop/use-cases/streaming/redis-py/_index.md +++ b/content/develop/use-cases/streaming/redis-py/_index.md @@ -16,7 +16,7 @@ This guide shows you how to build a Redis-backed event-streaming pipeline in Pyt ## Overview -A Redis Stream is an append-only log of field/value entries with auto-generated, time-ordered IDs. Producers append with [`XADD`]({{< relref "/commands/xadd" >}}); consumers belong to *consumer groups* and read with [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}), which gives each consumer a private cursor and a pending-entries list (PEL) of in-flight messages. Once a consumer has processed an entry it acknowledges it with [`XACK`]({{< relref "/commands/xack" >}}); entries left unacknowledged past an idle threshold can be reassigned to a healthy consumer with [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}). +A Redis Stream is an append-only log of field/value entries with auto-generated, time-ordered IDs. Producers append with [`XADD`]({{< relref "/commands/xadd" >}}); consumers belong to *consumer groups* and read with [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}). The group as a whole tracks a single `last-delivered-id` cursor, and each consumer gets its own pending-entries list (PEL) of messages it has been handed but not yet acknowledged. Once a consumer has processed an entry it calls [`XACK`]({{< relref "/commands/xack" >}}) to clear the entry from its PEL; entries left unacknowledged past an idle threshold can be reassigned to a healthy consumer with [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}). That gives you: @@ -74,8 +74,17 @@ for entry_id, fields in entries: handle(fields) # your processing stream.ack("notifications", [entry_id]) # XACK -# Recover entries from a crashed consumer (idle ≥ claim_min_idle_ms) -stream.autoclaim("notifications", "worker-b", count=100) +# Recover stuck PEL entries by reaping them into a healthy consumer. +# The textbook pattern: each consumer periodically calls XAUTOCLAIM +# with itself as the target and processes whatever it claimed. +# `ConsumerWorker.reap_idle_pel` wraps that flow; the low-level helper +# `stream.autoclaim(group, target_name)` is also available if you +# want to drive XAUTOCLAIM directly. +result = worker_b.reap_idle_pel() +# result == {"claimed": N, "processed": M, "deleted_ids": [...]} +# deleted_ids are PEL entries whose payload was already trimmed. +# Redis 7+ has already removed those slots from the PEL, so no XACK +# is needed — log them and route to a dead-letter store for audit. # Replay history (independent of any group's cursor) for entry_id, fields in stream.replay("-", "+", count=50): @@ -94,7 +103,7 @@ demo:events:orders ... ``` -The ID is `{milliseconds}-{sequence}`, so IDs are globally ordered and you can range-query by approximate wall-clock time without any extra index. The implementation uses: +The ID is `{milliseconds}-{sequence}`, monotonically increasing within the stream, so you can range-query by approximate wall-clock time without an extra index. (IDs are ordered within a stream, not across streams — two events appended to different streams at the same millisecond can produce the same ID.) The implementation uses: * [`XADD ... MAXLEN ~ n`]({{< relref "/commands/xadd" >}}), pipelined, for batch production with a retention cap * [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}) with the special ID `>` for fresh deliveries to a consumer @@ -169,7 +178,7 @@ def ack(self, group: str, ids: Iterable[str]) -> int: return int(self.redis.xack(self.stream_key, group, *ids)) ``` -This is the linchpin of at-least-once delivery: an entry that is never acked stays in the PEL forever (until a claim moves it elsewhere). If your consumer thread crashes between processing and ack, the entry is *retained*, not lost — the next claim sweep picks it up. +This is the linchpin of at-least-once delivery: an entry that is never acked stays in the PEL until a claim moves it elsewhere. If your consumer thread crashes between processing and ack, the next claim sweep picks the entry back up. The one caveat is retention: `XADD MAXLEN ~` and `XTRIM` can release the entry's *payload* even while its ID is still in the PEL. The next `XAUTOCLAIM` returns those IDs in its `deleted` list and removes them from the PEL inside the same command — the entry cannot be retried, so the caller should log it and route to a dead-letter store for audit. The example handles this explicitly in `_handle_autoclaim` further down. The trade-off is the opposite of pub/sub: a slow or crashed consumer doesn't lose messages, but it does mean your downstream system must be idempotent. If you process an order twice because the first attempt died after the side effect but before the ack, the second attempt must be safe. @@ -188,32 +197,64 @@ The `start_id="0-0"` argument means "deliver everything in the stream from the b ## Recovering crashed consumers with XAUTOCLAIM -The demo's "Crash next 3" button tells a chosen consumer to drop its next three deliveries on the floor without acking them — the same effect as a worker process dying mid-message. Those entries stay in the group's PEL with their delivery counter incremented. Once they have been idle for at least `claim_min_idle_ms`, the recovery sweep picks them up: +The demo's "Crash next 3" button tells a chosen consumer to drop its next three deliveries on the floor without acking them — the same effect as a worker process dying mid-message. Those entries stay in the group's PEL with their delivery counter incremented. Once they have been idle for at least `claim_min_idle_ms`, any healthy consumer in the group can rescue them by calling `XAUTOCLAIM` *with itself as the target*. `ConsumerWorker.reap_idle_pel` wraps that pattern: + +```python +def reap_idle_pel(self) -> dict: + claimed, deleted = self.stream.autoclaim( + self.group, self.name, page_count=100, max_pages=10, + ) + processed = 0 + for entry_id, fields in claimed: + try: + self._handle_entry(entry_id, fields) + processed += 1 + except Exception as exc: + print(f"reap failed on {entry_id}: {exc}") + return { + "claimed": len(claimed), + "deleted_ids": deleted, + "processed": processed, + } +``` + +The underlying `stream.autoclaim` helper pages through the group's PEL with `XAUTOCLAIM`'s continuation cursor: ```python def autoclaim( self, group: str, consumer: str, - count: int = 100, + page_count: int = 100, start_id: str = "0-0", -) -> list[Entry]: - _next_id, claimed, _deleted = self.redis.xautoclaim( - self.stream_key, - group, - consumer, - min_idle_time=self.claim_min_idle_ms, - start_id=start_id, - count=count, - ) - return list(claimed) + max_pages: int = 10, +) -> tuple[list[Entry], list[str]]: + claimed_all, deleted_all = [], [] + cursor = start_id + for _ in range(max_pages): + next_id, claimed, deleted = self.redis.xautoclaim( + self.stream_key, + group, + consumer, + min_idle_time=self.claim_min_idle_ms, + start_id=cursor, + count=page_count, + ) + claimed_all.extend(claimed) + deleted_all.extend(deleted or []) + if next_id == "0-0": + break + cursor = next_id + return claimed_all, deleted_all ``` -`XAUTOCLAIM` walks the group's PEL, finds every entry idle longer than `min_idle_time`, reassigns it to the named consumer, and returns the reassigned entries. The delivery counter is incremented on every claim — after a few cycles you can use it to detect a *poison-pill* message that crashes every consumer that touches it, and route it to a dead-letter stream instead of looping forever. +A single `XAUTOCLAIM` call scans up to `page_count` PEL entries starting at `start_id`, reassigns the ones idle for at least `min_idle_time` to the named consumer, and returns a continuation cursor in the first slot of the reply. For a full sweep, loop until the cursor returns to `0-0` (with a `max_pages` safety net so one call cannot monopolise a very large PEL). The delivery counter is incremented on every claim — after a few cycles you can use it to spot a *poison-pill* message that crashes every consumer that touches it, and route it to a dead-letter stream so the bad entry stops cycling. (New entries keep flowing past the poison pill — `XREADGROUP >` still delivers fresh work — but the bad entry's repeated reclaim wastes consumer time and keeps the PEL larger than it needs to be.) + +The `deleted` list contains PEL entry IDs whose stream payload was already trimmed by the time the claim ran (typically because `MAXLEN ~` retention outran a slow consumer). `XAUTOCLAIM` removes those dangling slots from the PEL itself, so the caller does *not* need to `XACK` them — but the entries cannot be retried either, so log and route them to a dead-letter store for offline inspection. Redis 7.0 introduced this third return element; the example requires Redis 7.0+ for that reason. -In production this loop runs periodically (every few seconds) on every healthy consumer, or on a dedicated reaper. The demo exposes it as a button so you can trigger it manually after waiting for the idle threshold. +`reap_idle_pel` is the right primitive for the recovery path because it claims and processes in one step: every entry the call returned is now in *this* consumer's PEL, so the same consumer is responsible for processing and acking it. In production each consumer thread runs `reap_idle_pel` periodically (every few seconds, on a timer) so a crashed peer's entries never sit invisibly. The demo exposes it as a manual button so you can trigger the reap after waiting for the idle threshold. -`XCLAIM` (singular, no auto) does the same thing for a specific list of entry IDs you already have in hand — useful when you want to take ownership of one known stuck entry rather than sweep the whole PEL. +`XCLAIM` (singular, no auto) does the same thing for a specific list of entry IDs you already have in hand — useful when you want to take ownership of one known stuck entry, or when you need to move a specific consumer's PEL to a peer (the case the demo's "Remove consumer" button handles via `handover_pending`). `XAUTOCLAIM` cannot filter by source consumer, so it cannot be used for a per-consumer handover. ## Replay with XRANGE @@ -267,11 +308,16 @@ def _run(self) -> None: `_handle_entry` either acks (the normal path) or, when the demo has asked the worker to "crash", drops the entry on the floor and increments a counter so the UI can show what is currently in the PEL waiting to be claimed. +Recovery of stuck PEL entries — this consumer's, after a restart, or another consumer's, after a crash — runs through a separate `reap_idle_pel` method rather than the read loop. That method calls `XAUTOCLAIM` with this consumer as the target, then processes whatever was claimed in the same flow as new entries. This is the textbook Streams pattern: each consumer is its own reaper, running `XAUTOCLAIM(self)` periodically (or on demand) so a crashed peer's entries never sit invisibly in the PEL. The demo's "XAUTOCLAIM to selected" button calls `reap_idle_pel` on the chosen consumer; in production you would run it from a timer every few seconds. + +Note that the worker's main read loop deliberately does *not* call `XREADGROUP 0` to drain its own PEL on every iteration. That would re-deliver every pending entry continuously and *reset its idle counter to zero* each time, which would keep crashed entries below the `XAUTOCLAIM` threshold forever. Using `XAUTOCLAIM(self)` as the recovery primitive — which only fires for entries idle longer than `min_idle_time` — avoids that whole class of bug. + The pause and crash levers exist only for the demo. A real consumer is just the read-process-ack loop — everything else in this class is instrumentation. ## Prerequisites -* Redis 6.2 or later (Redis 7.0+ recommended for `XAUTOCLAIM`). +* Redis 7.0 or later. `XAUTOCLAIM` was added in Redis 6.2, but its reply gained a third + element (the list of deleted IDs) in 7.0; the example relies on that shape. * Python 3.9 or later. * The `redis-py` client. Install it with: @@ -306,11 +352,14 @@ python3 demo_server.py You should see: ```text +Deleting any existing data at key 'demo:events:orders' for a clean demo run (pass --no-reset to keep it). Redis streaming demo server listening on http://127.0.0.1:8083 Using Redis at localhost:6379 with stream key 'demo:events:orders' (MAXLEN ~ 2000) Seeded 3 consumer(s) across 2 group(s) ``` +By default the demo wipes the configured stream key on startup so each run starts from a clean state. Pass `--no-reset` to keep any existing data at the key (useful when re-running against the same stream to inspect prior state), or `--stream-key ` to point the demo at a different key entirely. + Open [http://127.0.0.1:8083](http://127.0.0.1:8083) in a browser. You can: * **Produce** any number of events of a chosen type (or random types). Watch the stream length grow and the tail update. @@ -345,7 +394,7 @@ The same applies inside a group: `XINFO CONSUMERS` reports per-consumer pending ### Bound the delivery counter as a poison-pill signal -`XPENDING` returns each entry's delivery count, incremented on every claim. If an entry has been delivered (and dropped) several times, the next consumer is unlikely to fare better. After some threshold — `deliveries >= 5`, say — route the entry to a *dead-letter stream*, ack it on the original group, and alert. Without this, one bad entry can stop the group's forward progress indefinitely. +`XPENDING` returns each entry's delivery count, incremented on every claim. If an entry has been delivered (and dropped) several times, the next consumer is unlikely to fare better. After some threshold — `deliveries >= 5`, say — route the entry to a *dead-letter stream*, ack it on the original group, and alert. New entries keep flowing past a poison pill (`XREADGROUP >` still delivers fresh work), but the bad entry's repeated reclaim wastes consumer time and keeps the PEL bigger than it needs to be — without a DLQ threshold it can also slowly trip retention/lag alerts. ### Partition by tenant or entity for scale diff --git a/content/develop/use-cases/streaming/redis-py/consumer_worker.py b/content/develop/use-cases/streaming/redis-py/consumer_worker.py index a8963ab06d..f42b102d19 100644 --- a/content/develop/use-cases/streaming/redis-py/consumer_worker.py +++ b/content/develop/use-cases/streaming/redis-py/consumer_worker.py @@ -1,19 +1,26 @@ """ Background consumer thread for a single consumer in a consumer group. -Each worker owns a daemon thread that loops on ``XREADGROUP`` with a -short block timeout and acks every entry it processes. Two demo-only -levers are wired into the loop: +Each worker owns a daemon thread that loops on ``XREADGROUP >`` with a +short block timeout and acks every entry it processes. Recovery of +stuck PEL entries (this consumer's, or anyone else's) happens through +``reap_idle_pel()``, which is the textbook Streams pattern: each +consumer periodically (or on demand) calls ``XAUTOCLAIM`` with itself +as the target, then processes whatever it claimed. The demo's +"XAUTOCLAIM to selected" button is exactly that call. + +Two demo-only levers are wired into the loop: * ``pause()`` parks the worker (so its pending entries age into the - ``XAUTOCLAIM`` window). + ``XAUTOCLAIM`` window without being consumed by ``>`` reads). * ``crash_next(n)`` tells the worker to drop its next ``n`` deliveries on the floor without acking them — the same effect as a worker process dying mid-message. Those entries stay in the group's PEL - until claimed. + until ``reap_idle_pel`` recovers them. -Real consumers do not need either lever; they only need the -``XREADGROUP`` -> process -> ``XACK`` loop in ``_run``. +Real consumers do not need either lever; they only need +``XREADGROUP`` → process → ``XACK`` in ``_run`` and a periodic +``reap_idle_pel`` call to recover stuck entries. """ from __future__ import annotations @@ -45,6 +52,7 @@ def __init__( self._recent: deque[dict] = deque(maxlen=recent_capacity) self._lock = threading.Lock() self._processed = 0 + self._reaped = 0 self._crashed_drops = 0 self._paused = threading.Event() @@ -106,12 +114,52 @@ def status(self) -> dict: "name": self.name, "group": self.group, "processed": self._processed, + "reaped": self._reaped, "crashed_drops": self._crashed_drops, "paused": self._paused.is_set(), "crash_queued": self._crash_next, "alive": bool(self._thread and self._thread.is_alive()), } + # ------------------------------------------------------------------ + # Recovery + # ------------------------------------------------------------------ + + def reap_idle_pel(self) -> dict: + """Run ``XAUTOCLAIM`` into self and process the claimed entries. + + Returns a summary dict with ``claimed``, ``deleted_ids``, and + ``processed`` counts. Safe to call from any thread — the heavy + lifting is ``stream.autoclaim`` (a Redis call) and the + sequential per-entry dispatch via ``_dispatch``. + + ``deleted_ids`` are PEL entries whose stream payload was + already trimmed by ``MAXLEN ~`` / ``XTRIM`` before the sweep + ran. Redis 7+ removes them from the PEL inside ``XAUTOCLAIM`` + itself, so the caller does not have to ``XACK`` them; they are + reported so the caller can route them to a dead-letter store. + """ + claimed, deleted = self.stream.autoclaim( + self.group, self.name, page_count=100, max_pages=10, + ) + processed = 0 + for entry_id, fields in claimed: + try: + self._handle_entry(entry_id, fields) + processed += 1 + except Exception as exc: + print( + f"[{self.group}/{self.name}] reap failed on " + f"{entry_id}: {exc}" + ) + with self._lock: + self._reaped += processed + return { + "claimed": len(claimed), + "deleted_ids": deleted, + "processed": processed, + } + # ------------------------------------------------------------------ # Main loop # ------------------------------------------------------------------ @@ -133,9 +181,32 @@ def _run(self) -> None: continue for entry_id, fields in entries: - if self.process_latency_ms: - time.sleep(self.process_latency_ms / 1000.0) - self._handle_entry(entry_id, fields) + self._dispatch(entry_id, fields) + + def _dispatch(self, entry_id: str, fields: dict[str, str]) -> None: + if self.process_latency_ms: + time.sleep(self.process_latency_ms / 1000.0) + try: + self._handle_entry(entry_id, fields) + except Exception as exc: + # A failure here (typically XACK against Redis) must not + # kill the daemon thread — that would silently halt this + # consumer while every other entry sat in its PEL waiting + # for XAUTOCLAIM. The entry stays unacked; the next + # ``reap_idle_pel`` call (here or on any consumer in the + # group) can recover it once it exceeds the idle threshold. + print( + f"[{self.group}/{self.name}] failed to handle " + f"{entry_id}: {exc}" + ) + with self._lock: + self._recent.appendleft({ + "id": entry_id, + "type": fields.get("type", ""), + "fields": fields, + "acked": False, + "note": f"handler error: {exc}", + }) def _handle_entry(self, entry_id: str, fields: dict[str, str]) -> None: with self._lock: diff --git a/content/develop/use-cases/streaming/redis-py/demo_server.py b/content/develop/use-cases/streaming/redis-py/demo_server.py index 3a4b276610..32682c655e 100644 --- a/content/develop/use-cases/streaming/redis-py/demo_server.py +++ b/content/develop/use-cases/streaming/redis-py/demo_server.py @@ -26,7 +26,7 @@ import json import random import sys -import time +import threading from http.server import BaseHTTPRequestHandler, ThreadingHTTPServer from pathlib import Path from urllib.parse import parse_qs, urlparse @@ -289,6 +289,19 @@ groupsView.innerHTML = "

No groups.

"; return; } + // Preserve any text the user has typed into an add-consumer input + // (and which one was focused) so the 1.5s auto-refresh doesn't wipe it. + const addWorkerValues = {}; + let focusedGroup = null; + let focusedSelectionStart = null; + groupsView.querySelectorAll("input[id^='addworker-']").forEach((input) => { + const group = input.id.slice("addworker-".length); + addWorkerValues[group] = input.value; + if (document.activeElement === input) { + focusedGroup = group; + focusedSelectionStart = input.selectionStart; + } + }); groupsView.innerHTML = groups.map((g) => { const consumers = (g.consumers_detail || []).map((c) => { const recent = (c.recent || []).slice(0, 3).map((m) => ` @@ -302,7 +315,7 @@ return `
${escapeHtml(c.name)} - pending=${c.pending} idle=${c.idle_ms}ms processed=${c.processed} + pending=${c.pending} idle=${c.idle_ms}ms processed=${c.processed} reaped=${c.reaped ?? 0} ${badges.join(" ")} @@ -324,6 +337,21 @@
`; }).join(""); + // Restore the typed text (and focus) into the add-consumer inputs. + for (const [group, value] of Object.entries(addWorkerValues)) { + const input = document.getElementById(`addworker-${group}`); + if (input) input.value = value; + } + if (focusedGroup) { + const input = document.getElementById(`addworker-${focusedGroup}`); + if (input) { + input.focus(); + if (focusedSelectionStart !== null) { + try { input.setSelectionRange(focusedSelectionStart, focusedSelectionStart); } catch (_) {} + } + } + } + // Populate the autoclaim-target dropdown with every (group, consumer) const previous = autoclaimTarget.value; const options = []; @@ -421,13 +449,24 @@ const body = new URLSearchParams({ group, consumer }); const r = await fetch("/autoclaim", { method: "POST", body }); const d = await r.json(); - setStatus(`XAUTOCLAIM reassigned ${d.claimed} entry/entries to ${group}/${consumer}.`, "ok"); - const rows = (d.entries || []).map((e) => ` - ${escapeHtml(e.id)}${escapeHtml(e.fields.type)}`).join(""); + if (!r.ok) { setStatus(d.error || "Autoclaim failed.", "error"); return; } + const deletedCount = (d.deleted || []).length; + const msg = deletedCount + ? `${group}/${consumer} reaped ${d.claimed} entry/entries and processed ${d.processed}; ${deletedCount} pending ID(s) were already trimmed out of the stream and removed from the PEL by Redis.` + : `${group}/${consumer} reaped ${d.claimed} entry/entries and processed ${d.processed}.`; + setStatus(msg, "ok"); + const deletedBlock = deletedCount + ? `

Deleted IDs (payload already trimmed — removed from PEL by Redis)

+

${(d.deleted || []).map(escapeHtml).join(", ")}

+

In production these would also be routed to a dead-letter store for offline inspection.

` + : ""; resultView.innerHTML = ` -

Claimed ${d.claimed} entry/entries idle ≥ ${d.min_idle_ms} ms into ${escapeHtml(group)}/${escapeHtml(consumer)}.

- ${rows ? `${rows}
idtype
` - : "

(nothing was idle enough yet — try again after a few seconds)

"}`; +

${escapeHtml(group)}/${escapeHtml(consumer)} ran XAUTOCLAIM + into itself with min_idle_time = ${d.min_idle_ms} ms, + claimed ${d.claimed} stuck entry/entries, processed + ${d.processed}, and acked them.

+ ${d.claimed === 0 ? "

(nothing was idle enough yet — try again after a few seconds)

" : ""} + ${deletedBlock}`; await refresh(); }); @@ -452,10 +491,17 @@ await refresh(); } else if (action === "remove") { const name = t.dataset.name; - if (!confirm(`Remove ${group}/${name}? Its pending entries will be released.`)) return; + if (!confirm(`Remove ${group}/${name}? Any pending entries it still owns will be handed over to a peer consumer in the group via XCLAIM before XGROUP DELCONSUMER.`)) return; const body = new URLSearchParams({ group, name }); - await fetch("/remove-worker", { method: "POST", body }); - setStatus(`Removed ${group}/${name}.`, "ok"); + const r = await fetch("/remove-worker", { method: "POST", body }); + const d = await r.json(); + if (!d.removed) { + setStatus(d.message || `Could not remove ${group}/${name} (${d.reason || "unknown"}).`, "error"); + } else if (d.handed_over_count > 0) { + setStatus(`Removed ${group}/${name}. Handed ${d.handed_over_count} pending entr${d.handed_over_count === 1 ? "y" : "ies"} over to ${d.handed_over_to}.`, "ok"); + } else { + setStatus(`Removed ${group}/${name} (no pending entries to hand over).`, "ok"); + } await refresh(); } else if (action === "add") { const input = document.getElementById(`addworker-${group}`); @@ -480,51 +526,102 @@ class StreamingDemo: - """In-memory registry of consumer workers across all groups.""" + """In-memory registry of consumer workers across all groups. + + ``ThreadingHTTPServer`` dispatches each HTTP request on a fresh + thread, so any code that mutates ``self.workers`` (or iterates it + while another handler is mutating it) needs the lock. + """ def __init__(self, stream: RedisEventStream) -> None: self.stream = stream self.workers: dict[tuple[str, str], ConsumerWorker] = {} + self._lock = threading.RLock() def seed(self, groups: dict[str, list[str]]) -> int: - for group, names in groups.items(): - self.stream.ensure_group(group, start_id="0-0") - for name in names: - self.add_worker(group, name) - return sum(len(v) for v in groups.values()) + with self._lock: + for group, names in groups.items(): + self.stream.ensure_group(group, start_id="0-0") + for name in names: + self.add_worker(group, name) + return sum(len(v) for v in groups.values()) def add_worker(self, group: str, name: str) -> bool: - key = (group, name) - if key in self.workers: - return False - self.stream.ensure_group(group, start_id="0-0") - worker = ConsumerWorker(self.stream, group=group, name=name) - worker.start() - self.workers[key] = worker - return True - - def remove_worker(self, group: str, name: str) -> bool: - key = (group, name) - worker = self.workers.pop(key, None) - if worker is None: - return False - worker.stop() - self.stream.delete_consumer(group, name) - return True + with self._lock: + key = (group, name) + if key in self.workers: + return False + self.stream.ensure_group(group, start_id="0-0") + worker = ConsumerWorker(self.stream, group=group, name=name) + worker.start() + self.workers[key] = worker + return True + + def remove_worker(self, group: str, name: str) -> dict: + """Remove a consumer safely. + + ``XGROUP DELCONSUMER`` destroys the consumer's PEL entries + outright, so any pending message it still owned would become + unreachable. Before deleting, hand its PEL off to another + consumer in the same group with ``XCLAIM``. Without a peer + consumer to take over, refuse to delete and leave the worker + in place so the user can add a peer first. + """ + with self._lock: + key = (group, name) + worker = self.workers.get(key) + if worker is None: + return {"removed": False, "reason": "not-found"} + + peers = [ + n for (g, n) in self.workers if g == group and n != name + ] + if not peers: + return { + "removed": False, + "reason": "no-peer", + "message": ( + f"{group}/{name} still owns pending entries and is the only " + "consumer in its group; add another consumer first so its " + "PEL can be handed over before deletion." + ), + } + + handover_target = peers[0] + claimed = self.stream.handover_pending( + group, from_consumer=name, to_consumer=handover_target, + ) + + self.workers.pop(key, None) + worker.stop() + self.stream.delete_consumer(group, name) + return { + "removed": True, + "handed_over_to": handover_target, + "handed_over_count": len(claimed), + } def get_worker(self, group: str, name: str) -> ConsumerWorker | None: - return self.workers.get((group, name)) + with self._lock: + return self.workers.get((group, name)) + + def workers_snapshot(self) -> list[tuple[tuple[str, str], ConsumerWorker]]: + """Stable list of (key, worker) safe to iterate outside the lock.""" + with self._lock: + return list(self.workers.items()) def stop_all(self) -> None: - for worker in list(self.workers.values()): - worker.stop() - self.workers.clear() + with self._lock: + for worker in list(self.workers.values()): + worker.stop() + self.workers.clear() def reset(self) -> int: - self.stop_all() - self.stream.delete_stream() - self.stream.reset_stats() - return self.seed(DEFAULT_GROUPS) + with self._lock: + self.stop_all() + self.stream.delete_stream() + self.stream.reset_stats() + return self.seed(DEFAULT_GROUPS) class StreamingDemoHandler(BaseHTTPRequestHandler): @@ -602,8 +699,9 @@ def _handle_remove_worker(self) -> None: params = self._read_form_data() group = params.get("group", [""])[0].strip() name = params.get("name", [""])[0].strip() - removed = self.demo.remove_worker(group, name) - self._send_json({"removed": removed}, 200) + result = self.demo.remove_worker(group, name) + status = 200 if result.get("removed") or result.get("reason") == "not-found" else 409 + self._send_json(result, status) def _handle_crash(self) -> None: params = self._read_form_data() @@ -618,20 +716,37 @@ def _handle_crash(self) -> None: self._send_json({"queued": count}, 200) def _handle_autoclaim(self) -> None: + """Have the chosen consumer reap stuck PEL entries into itself. + + This is the textbook ``XAUTOCLAIM`` recovery flow: each + consumer periodically calls ``XAUTOCLAIM`` with itself as the + target, then processes whatever was returned. The demo + exposes it as a manual button so you can trigger the reap on + a chosen consumer after waiting for the idle threshold. + """ params = self._read_form_data() group = params.get("group", [""])[0].strip() consumer = params.get("consumer", [""])[0].strip() if not group or not consumer: self._send_json({"error": "group and consumer are required"}, 400) return - claimed = self.stream.autoclaim(group, consumer, count=100) - entries = [ - {"id": entry_id, "fields": fields} for entry_id, fields in claimed - ] + worker = self.demo.get_worker(group, consumer) + if worker is None: + self._send_json({"error": f"unknown consumer {group}/{consumer}"}, 404) + return + # ``reap_idle_pel`` runs XAUTOCLAIM(self) + process + ack. + # ``deleted_ids`` are PEL entries whose stream payload was + # already trimmed by ``MAXLEN ~`` before the sweep ran. Redis + # 7+ removes them from the PEL inside XAUTOCLAIM itself, so + # the caller doesn't have to XACK them; in production they + # would be routed to a dead-letter store for offline + # inspection. + result = worker.reap_idle_pel() self._send_json( { - "claimed": len(claimed), - "entries": entries, + "claimed": result["claimed"], + "processed": result["processed"], + "deleted": result["deleted_ids"], "min_idle_ms": self.stream.claim_min_idle_ms, }, 200, @@ -669,11 +784,14 @@ def _build_state(self) -> dict: groups_detail = [] pending_rows: list[dict] = [] + # Snapshot the workers dict under the demo's lock once per state + # build so concurrent add/remove requests can't change it mid-loop. + workers_snapshot = self.demo.workers_snapshot() for group in groups: name = group["name"] consumer_info = {c["name"]: c for c in self.stream.info_consumers(name)} consumers_detail = [] - for (g_name, c_name), worker in self.demo.workers.items(): + for (g_name, c_name), worker in workers_snapshot: if g_name != name: continue info = consumer_info.get(c_name, {}) @@ -706,8 +824,11 @@ def _build_state(self) -> dict: for row in self.stream.pending_detail(name, count=50): pending_rows.append({**row, "group": name}) - tail_entries = self.stream.replay("-", "+", count=10) - tail_entries = list(reversed(tail_entries)) # newest first + # XREVRANGE returns the *newest* N entries (in reverse order) — the + # tail view wants the most recent activity, not the head of history. + tail_entries = list(self.stream.redis.xrevrange( + self.stream.stream_key, max="+", min="-", count=10, + )) tail = [ {"id": entry_id, "fields": fields} for entry_id, fields in tail_entries ] @@ -782,6 +903,16 @@ def parse_args() -> argparse.Namespace: default=5000, help="Minimum idle time before XAUTOCLAIM may reassign a pending entry", ) + parser.add_argument( + "--no-reset", + dest="reset_on_start", + action="store_false", + help=( + "Keep any existing data at --stream-key instead of deleting it" + " on startup. By default the demo wipes the stream so each run" + " starts from an empty state." + ), + ) return parser.parse_args() @@ -800,7 +931,12 @@ def main() -> None: claim_min_idle_ms=args.claim_idle_ms, ) demo = StreamingDemo(stream) - stream.delete_stream() + if args.reset_on_start: + print( + f"Deleting any existing data at key '{args.stream_key}'" + " for a clean demo run (pass --no-reset to keep it)." + ) + stream.delete_stream() seeded = demo.seed(DEFAULT_GROUPS) StreamingDemoHandler.stream = stream diff --git a/content/develop/use-cases/streaming/redis-py/event_stream.py b/content/develop/use-cases/streaming/redis-py/event_stream.py index adb9b0bea9..eca95bd28c 100644 --- a/content/develop/use-cases/streaming/redis-py/event_stream.py +++ b/content/develop/use-cases/streaming/redis-py/event_stream.py @@ -2,19 +2,25 @@ Redis event-stream helper backed by a single Redis Stream. Producers append events with ``XADD``. Consumers belong to consumer -groups and read with ``XREADGROUP``, which gives each consumer a -private cursor and a pending-entries list (PEL) of in-flight messages. +groups and read with ``XREADGROUP``. The group as a whole tracks a +single ``last-delivered-id`` cursor, and each consumer gets its own +pending-entries list (PEL) of in-flight messages it has been handed. Once a consumer has processed an entry it acknowledges it with ``XACK``; entries left unacknowledged past an idle threshold can be swept to a healthy consumer with ``XAUTOCLAIM`` (or to a specific one with ``XCLAIM``). Each ``XADD`` carries an approximate ``MAXLEN`` so the stream stays -bounded as it rolls forward. ``XRANGE`` supports replay from any point -in history for debugging, audit, or rebuilding a downstream projection. +bounded as it rolls forward. ``XRANGE`` supports replay over the +retained history for debugging, audit, or rebuilding a downstream +projection. Note that approximate trimming can release entries that +are still in a group's PEL: those entries appear in ``XAUTOCLAIM``'s +deleted-IDs list, which the caller should log and route to a +dead-letter store. Redis 7+ removes them from the PEL inside the +``XAUTOCLAIM`` call itself, so no explicit ``XACK`` is needed. The same stream can be read by any number of consumer groups — each -group has its own cursor and its own pending list, so analytics, +group has its own cursor and its own pending lists, so analytics, notifications, and audit can all process the full event flow at their own pace without coordinating with each other. """ @@ -128,8 +134,8 @@ def consume( The ``>`` ID means "deliver entries this consumer group has not delivered to *anyone* yet" — that is the at-least-once path. Replaying an explicit ID instead would re-deliver an entry that - is already in this consumer's pending list (used to recover - after a crash on the same consumer name). + is already in this consumer's pending list (see + ``consume_own_pel`` for that recovery path). """ result = self.redis.xreadgroup( group, @@ -140,6 +146,29 @@ def consume( ) return _flatten_entries(result) + def consume_own_pel( + self, + group: str, + consumer: str, + count: int = 10, + ) -> list[Entry]: + """Re-deliver entries already in this consumer's PEL. + + Reading with an explicit ID (``0`` here) instead of ``>`` + replays the entries already assigned to this consumer name + without advancing the group's ``last-delivered-id``. This is + the canonical recovery path after a crash on the same + consumer name, and is also how a consumer picks up entries + that another consumer (or ``XAUTOCLAIM``) handed to it. + """ + result = self.redis.xreadgroup( + group, + consumer, + {self.stream_key: "0"}, + count=count, + ) + return _flatten_entries(result) + def ack(self, group: str, ids: Iterable[str]) -> int: ids = list(ids) if not ids: @@ -153,31 +182,57 @@ def autoclaim( self, group: str, consumer: str, - count: int = 100, + page_count: int = 100, start_id: str = "0-0", - ) -> list[Entry]: + max_pages: int = 10, + ) -> tuple[list[Entry], list[str]]: """Sweep idle pending entries to ``consumer``. - ``XAUTOCLAIM`` walks the group's PEL from ``start_id`` and - reassigns every entry that has been idle for at least - ``claim_min_idle_ms`` to the named consumer. The reassigned - entry's delivery counter is incremented so a poison-pill - message can be detected after a few claim cycles. + A single ``XAUTOCLAIM`` call scans up to ``page_count`` PEL + entries starting at ``start_id`` and returns a continuation + cursor. For a full sweep of the PEL, loop until the cursor + returns to ``0-0`` (or hit ``max_pages`` as a safety net so a + very large PEL can't monopolise the call). + + Returns ``(claimed, deleted_ids)``. ``deleted_ids`` are PEL + entries whose stream payload had already been trimmed by the + time this sweep ran (typically because ``MAXLEN ~`` retention + outran a slow consumer). ``XAUTOCLAIM`` removes those dangling + slots from the PEL itself — the caller does **not** need to + ``XACK`` them — but they cannot be retried, so log and route + them to a dead-letter store for observability. """ - _next_id, claimed, _deleted = self.redis.xautoclaim( - self.stream_key, - group, - consumer, - min_idle_time=self.claim_min_idle_ms, - start_id=start_id, - count=count, - ) + claimed_all: list[Entry] = [] + deleted_all: list[str] = [] + cursor = start_id + for _ in range(max_pages): + next_id, claimed, deleted = self.redis.xautoclaim( + self.stream_key, + group, + consumer, + min_idle_time=self.claim_min_idle_ms, + start_id=cursor, + count=page_count, + ) + claimed_all.extend(claimed) + deleted_all.extend(deleted or []) + if next_id == "0-0": + break + cursor = next_id with self._stats_lock: - self._claimed_total += len(claimed) - return list(claimed) + self._claimed_total += len(claimed_all) + return claimed_all, deleted_all def delete_consumer(self, group: str, consumer: str) -> int: - """Remove a consumer from a group. Its pending entries are released.""" + """Drop a consumer from a group. + + ``XGROUP DELCONSUMER`` destroys this consumer's PEL entries — + any entry it still owned is no longer tracked anywhere in the + group, and ``XAUTOCLAIM`` will never find it again. Always + ``handover_pending`` (or ``XCLAIM`` it manually) to a healthy + consumer first; this method is the raw destructive call and + is exposed only for explicit cleanup. + """ try: return int(self.redis.xgroup_delconsumer( self.stream_key, group, consumer, @@ -185,6 +240,44 @@ def delete_consumer(self, group: str, consumer: str) -> int: except redis.ResponseError: return 0 + def handover_pending( + self, + group: str, + from_consumer: str, + to_consumer: str, + batch: int = 100, + ) -> list[Entry]: + """Move every PEL entry owned by ``from_consumer`` to ``to_consumer``. + + Enumerates the source consumer's PEL with ``XPENDING ... CONSUMER`` + and reassigns each ID with ``XCLAIM`` at zero idle time so the + move is unconditional. (``XAUTOCLAIM`` does not filter by source + consumer, so it cannot be used for a per-consumer handover.) + + Call this before ``delete_consumer`` whenever the source still + has pending entries — otherwise ``XGROUP DELCONSUMER`` would + silently destroy them and they could never be recovered. + """ + claimed_all: list[Entry] = [] + while True: + rows = self.redis.xpending_range( + self.stream_key, group, min="-", max="+", + count=batch, consumername=from_consumer, + ) + if not rows: + break + ids = [row["message_id"] for row in rows] + claimed = self.redis.xclaim( + self.stream_key, group, to_consumer, + min_idle_time=0, message_ids=ids, + ) + claimed_all.extend(claimed) + if len(rows) < batch: + break + with self._stats_lock: + self._claimed_total += len(claimed_all) + return claimed_all + # ------------------------------------------------------------------ # Replay, length, trim # ------------------------------------------------------------------ From 0a31d112498f0509407a4719746e81a9c10be33a Mon Sep 17 00:00:00 2001 From: Andy Stark Date: Thu, 14 May 2026 16:05:16 +0100 Subject: [PATCH 3/4] DOC-6619 draft of other client examples --- .../streaming/dotnet/ConsumerWorker.cs | 351 +++++ .../use-cases/streaming/dotnet/EventStream.cs | 547 +++++++ .../use-cases/streaming/dotnet/Program.cs | 992 +++++++++++++ .../streaming/dotnet/StreamingDemo.csproj | 14 + .../use-cases/streaming/dotnet/_index.md | 507 +++++++ .../develop/use-cases/streaming/go/_index.md | 514 +++++++ .../use-cases/streaming/go/consumer_worker.go | 338 +++++ .../use-cases/streaming/go/demo_server.go | 1167 +++++++++++++++ .../use-cases/streaming/go/event_stream.go | 679 +++++++++ content/develop/use-cases/streaming/go/go.mod | 11 + content/develop/use-cases/streaming/go/go.sum | 22 + .../streaming/java-jedis/ConsumerWorker.java | 308 ++++ .../streaming/java-jedis/DemoServer.java | 1215 ++++++++++++++++ .../streaming/java-jedis/EventStream.java | 671 +++++++++ .../use-cases/streaming/java-jedis/_index.md | 502 +++++++ .../java-lettuce/ConsumerWorker.java | 327 +++++ .../streaming/java-lettuce/DemoServer.java | 1165 +++++++++++++++ .../streaming/java-lettuce/EventStream.java | 655 +++++++++ .../streaming/java-lettuce/_index.md | 477 +++++++ .../use-cases/streaming/nodejs/_index.md | 494 +++++++ .../streaming/nodejs/consumerWorker.js | 321 +++++ .../use-cases/streaming/nodejs/demoServer.js | 1156 +++++++++++++++ .../use-cases/streaming/nodejs/eventStream.js | 566 ++++++++ .../use-cases/streaming/nodejs/package.json | 16 + .../streaming/php/ConsumerWorker.php | 483 +++++++ .../use-cases/streaming/php/EventStream.php | 674 +++++++++ .../develop/use-cases/streaming/php/_index.md | 545 +++++++ .../use-cases/streaming/php/composer.json | 8 + .../use-cases/streaming/php/demo_server.php | 1200 ++++++++++++++++ .../streaming/redis-py/demo_server.py | 4 +- .../use-cases/streaming/ruby/_index.md | 461 ++++++ .../streaming/ruby/consumer_worker.rb | 251 ++++ .../use-cases/streaming/ruby/demo_server.rb | 959 +++++++++++++ .../use-cases/streaming/ruby/event_stream.rb | 370 +++++ .../use-cases/streaming/rust/Cargo.toml | 16 + .../use-cases/streaming/rust/_index.md | 548 +++++++ .../streaming/rust/consumer_worker.rs | 393 +++++ .../use-cases/streaming/rust/demo_server.rs | 1261 +++++++++++++++++ .../use-cases/streaming/rust/event_stream.rs | 736 ++++++++++ 39 files changed, 20922 insertions(+), 2 deletions(-) create mode 100644 content/develop/use-cases/streaming/dotnet/ConsumerWorker.cs create mode 100644 content/develop/use-cases/streaming/dotnet/EventStream.cs create mode 100644 content/develop/use-cases/streaming/dotnet/Program.cs create mode 100644 content/develop/use-cases/streaming/dotnet/StreamingDemo.csproj create mode 100644 content/develop/use-cases/streaming/dotnet/_index.md create mode 100644 content/develop/use-cases/streaming/go/_index.md create mode 100644 content/develop/use-cases/streaming/go/consumer_worker.go create mode 100644 content/develop/use-cases/streaming/go/demo_server.go create mode 100644 content/develop/use-cases/streaming/go/event_stream.go create mode 100644 content/develop/use-cases/streaming/go/go.mod create mode 100644 content/develop/use-cases/streaming/go/go.sum create mode 100644 content/develop/use-cases/streaming/java-jedis/ConsumerWorker.java create mode 100644 content/develop/use-cases/streaming/java-jedis/DemoServer.java create mode 100644 content/develop/use-cases/streaming/java-jedis/EventStream.java create mode 100644 content/develop/use-cases/streaming/java-jedis/_index.md create mode 100644 content/develop/use-cases/streaming/java-lettuce/ConsumerWorker.java create mode 100644 content/develop/use-cases/streaming/java-lettuce/DemoServer.java create mode 100644 content/develop/use-cases/streaming/java-lettuce/EventStream.java create mode 100644 content/develop/use-cases/streaming/java-lettuce/_index.md create mode 100644 content/develop/use-cases/streaming/nodejs/_index.md create mode 100644 content/develop/use-cases/streaming/nodejs/consumerWorker.js create mode 100644 content/develop/use-cases/streaming/nodejs/demoServer.js create mode 100644 content/develop/use-cases/streaming/nodejs/eventStream.js create mode 100644 content/develop/use-cases/streaming/nodejs/package.json create mode 100644 content/develop/use-cases/streaming/php/ConsumerWorker.php create mode 100644 content/develop/use-cases/streaming/php/EventStream.php create mode 100644 content/develop/use-cases/streaming/php/_index.md create mode 100644 content/develop/use-cases/streaming/php/composer.json create mode 100644 content/develop/use-cases/streaming/php/demo_server.php create mode 100644 content/develop/use-cases/streaming/ruby/_index.md create mode 100644 content/develop/use-cases/streaming/ruby/consumer_worker.rb create mode 100644 content/develop/use-cases/streaming/ruby/demo_server.rb create mode 100644 content/develop/use-cases/streaming/ruby/event_stream.rb create mode 100644 content/develop/use-cases/streaming/rust/Cargo.toml create mode 100644 content/develop/use-cases/streaming/rust/_index.md create mode 100644 content/develop/use-cases/streaming/rust/consumer_worker.rs create mode 100644 content/develop/use-cases/streaming/rust/demo_server.rs create mode 100644 content/develop/use-cases/streaming/rust/event_stream.rs diff --git a/content/develop/use-cases/streaming/dotnet/ConsumerWorker.cs b/content/develop/use-cases/streaming/dotnet/ConsumerWorker.cs new file mode 100644 index 0000000000..4573aa055f --- /dev/null +++ b/content/develop/use-cases/streaming/dotnet/ConsumerWorker.cs @@ -0,0 +1,351 @@ +namespace StreamingDemo; + +/// +/// Lightweight in-memory record of one entry as the worker saw it. +/// +public sealed record ConsumerActivity( + string Id, + string Type, + Dictionary Fields, + bool Acked, + string Note); + +/// +/// Summary returned by . +/// +public sealed record ReapResult( + int Claimed, + IReadOnlyList DeletedIds, + int Processed); + +/// +/// Background consumer thread for a single consumer in a consumer group. +/// +/// Each worker owns a daemon thread that loops on XREADGROUP > on a short +/// poll interval and acks every entry it processes. Recovery of stuck PEL +/// entries (this consumer's, or anyone else's) happens through +/// , which is the textbook Streams pattern: each +/// consumer periodically (or on demand) calls XAUTOCLAIM with itself as +/// the target, then processes whatever it claimed. The demo's +/// "XAUTOCLAIM to selected" button is exactly that call. +/// +/// Two demo-only levers are wired into the loop: +/// +/// Pause parks the worker (so its pending +/// entries age into the XAUTOCLAIM window without being consumed by +/// > reads). +/// CrashNext(n) tells the worker to drop its +/// next n deliveries on the floor without acking them — the same +/// effect as a worker process dying mid-message. Those entries stay in +/// the group's PEL until recovers +/// them. +/// +/// Real consumers do not need either lever; they only need +/// XREADGROUP → process → XACK in the run loop and a periodic +/// call to recover stuck entries. +/// +public sealed class ConsumerWorker +{ + private readonly EventStream _stream; + private readonly int _processLatencyMs; + private readonly int _recentCapacity; + private readonly int _pollIntervalMs; + + private readonly object _lock = new(); + private readonly LinkedList _recent = new(); + private int _processed; + private int _reaped; + private int _crashedDrops; + private int _crashNext; + private bool _paused; + + private Thread? _thread; + private volatile bool _stop; + + public ConsumerWorker( + EventStream stream, + string group, + string name, + int processLatencyMs = 25, + int recentCapacity = 20, + int pollIntervalMs = 100) + { + _stream = stream ?? throw new ArgumentNullException(nameof(stream)); + Group = group; + Name = name; + _processLatencyMs = processLatencyMs; + _recentCapacity = recentCapacity; + _pollIntervalMs = pollIntervalMs; + } + + public string Group { get; } + public string Name { get; } + + public bool IsAlive => _thread is { IsAlive: true }; + + // ------------------------------------------------------------------ + // Lifecycle + // ------------------------------------------------------------------ + + public void Start() + { + lock (_lock) + { + if (_thread is { IsAlive: true }) + { + return; + } + _stop = false; + _thread = new Thread(Run) + { + IsBackground = true, + Name = $"consumer-{Group}-{Name}", + }; + _thread.Start(); + } + } + + public void Stop(int timeoutMs = 1000) + { + _stop = true; + var t = _thread; + if (t is not null && t.IsAlive) + { + t.Join(timeoutMs); + } + } + + // ------------------------------------------------------------------ + // Demo levers + // ------------------------------------------------------------------ + + public void Pause() + { + lock (_lock) + { + _paused = true; + } + } + + public void Resume() + { + lock (_lock) + { + _paused = false; + } + } + + /// + /// Drop the next deliveries without acking + /// them. The entries stay in the group's PEL with their delivery + /// counter incremented, so XAUTOCLAIM can recover them once they + /// exceed the idle threshold. + /// + public void CrashNext(int count) + { + lock (_lock) + { + _crashNext += Math.Max(0, count); + } + } + + // ------------------------------------------------------------------ + // Introspection + // ------------------------------------------------------------------ + + public List Recent() + { + lock (_lock) + { + return _recent.ToList(); + } + } + + public Dictionary Status() + { + lock (_lock) + { + return new Dictionary + { + ["name"] = Name, + ["group"] = Group, + ["processed"] = _processed, + ["reaped"] = _reaped, + ["crashed_drops"] = _crashedDrops, + ["paused"] = _paused, + ["crash_queued"] = _crashNext, + ["alive"] = IsAlive, + }; + } + } + + // ------------------------------------------------------------------ + // Recovery + // ------------------------------------------------------------------ + + /// + /// Run XAUTOCLAIM into self and process the claimed entries. + /// Returns a summary with Claimed, DeletedIds, and + /// Processed counts. Safe to call from any thread — the + /// heavy lifting is (a Redis + /// call) and the sequential per-entry dispatch via + /// . + /// + /// DeletedIds are PEL entries whose stream payload was + /// already trimmed by MAXLEN ~ / XTRIM before the sweep ran. + /// Redis 7+ removes them from the PEL inside XAUTOCLAIM itself, + /// so the caller does not have to XACK them; they are reported so + /// the caller can route them to a dead-letter store. + /// + public ReapResult ReapIdlePel() + { + var swept = _stream.Autoclaim(Group, Name, pageCount: 100, maxPages: 10); + var processed = 0; + foreach (var record in swept.Claimed) + { + try + { + HandleEntry(record.Id, record.Fields); + processed++; + } + catch (Exception ex) + { + Console.Error.WriteLine($"[{Group}/{Name}] reap failed on {record.Id}: {ex.Message}"); + } + } + lock (_lock) + { + _reaped += processed; + } + return new ReapResult(swept.Claimed.Count, swept.DeletedIds, processed); + } + + // ------------------------------------------------------------------ + // Main loop + // ------------------------------------------------------------------ + + private void Run() + { + while (!_stop) + { + bool paused; + lock (_lock) + { + paused = _paused; + } + if (paused) + { + Thread.Sleep(50); + continue; + } + + List entries; + try + { + entries = _stream.Consume(Group, Name, count: 10); + } + catch (Exception ex) + { + // Don't kill the thread on a transient Redis error; a + // real consumer would log this and back off. + Console.Error.WriteLine($"[{Group}/{Name}] read failed: {ex.Message}"); + Thread.Sleep(500); + continue; + } + + if (entries.Count == 0) + { + // StackExchange.Redis' XREADGROUP wrapper is non-blocking, + // so we poll on a short interval. A blocking BLOCK option + // would otherwise monopolise the multiplexer's pipeline. + Thread.Sleep(_pollIntervalMs); + continue; + } + + foreach (var entry in entries) + { + Dispatch(entry.Id, entry.Fields); + } + } + } + + private void Dispatch(string entryId, Dictionary fields) + { + if (_processLatencyMs > 0) + { + Thread.Sleep(_processLatencyMs); + } + try + { + HandleEntry(entryId, fields); + } + catch (Exception ex) + { + // A failure here (typically XACK against Redis) must not + // kill the daemon thread — that would silently halt this + // consumer while every other entry sat in its PEL waiting + // for XAUTOCLAIM. The entry stays unacked; the next + // ReapIdlePel call (here or on any consumer in the group) + // can recover it once it exceeds the idle threshold. + Console.Error.WriteLine($"[{Group}/{Name}] failed to handle {entryId}: {ex.Message}"); + lock (_lock) + { + Push(new ConsumerActivity( + entryId, + fields.TryGetValue("type", out var type) ? type : "", + fields, + Acked: false, + Note: $"handler error: {ex.Message}")); + } + } + } + + private void HandleEntry(string entryId, Dictionary fields) + { + bool drop; + lock (_lock) + { + drop = _crashNext > 0; + if (drop) + { + _crashNext--; + } + } + + if (drop) + { + lock (_lock) + { + _crashedDrops++; + Push(new ConsumerActivity( + entryId, + fields.TryGetValue("type", out var type) ? type : "", + fields, + Acked: false, + Note: "dropped (simulated crash)")); + } + return; + } + + _stream.Ack(Group, new[] { entryId }); + lock (_lock) + { + _processed++; + Push(new ConsumerActivity( + entryId, + fields.TryGetValue("type", out var type) ? type : "", + fields, + Acked: true, + Note: "")); + } + } + + private void Push(ConsumerActivity activity) + { + _recent.AddFirst(activity); + while (_recent.Count > _recentCapacity) + { + _recent.RemoveLast(); + } + } +} diff --git a/content/develop/use-cases/streaming/dotnet/EventStream.cs b/content/develop/use-cases/streaming/dotnet/EventStream.cs new file mode 100644 index 0000000000..da70a27695 --- /dev/null +++ b/content/develop/use-cases/streaming/dotnet/EventStream.cs @@ -0,0 +1,547 @@ +using StackExchange.Redis; + +namespace StreamingDemo; + +/// +/// One entry as it comes off the stream: a stream ID and a flat +/// field/value dictionary of strings. +/// +public sealed record StreamRecord(string Id, Dictionary Fields); + +/// +/// A single pending-entries-list (PEL) row, as returned by XPENDING. +/// +public sealed record PendingEntry( + string Id, + string Consumer, + long IdleMs, + int Deliveries); + +/// +/// Result of an XAUTOCLAIM sweep: every claimed entry plus any IDs +/// whose stream payload had already been trimmed by MAXLEN ~. +/// +public sealed record AutoClaimResult( + IReadOnlyList Claimed, + IReadOnlyList DeletedIds); + +/// +/// Producer/consumer helper for a single Redis Stream with consumer +/// groups. Producers append events with XADD; consumers belong to +/// consumer groups and read with XREADGROUP. Each consumer gets its +/// own pending-entries list (PEL) of in-flight messages it has been +/// handed; once a consumer has processed an entry it acknowledges it +/// with XACK. Entries left unacknowledged past an idle threshold can +/// be swept to a healthy consumer with XAUTOCLAIM (or to a specific +/// one with XCLAIM). +/// +public sealed class EventStream +{ + private readonly IDatabase _db; + private readonly string _streamKey; + private readonly int _maxlenApprox; + private readonly long _claimMinIdleMs; + + private long _producedTotal; + private long _ackedTotal; + private long _claimedTotal; + + public EventStream( + IDatabase db, + string streamKey = "demo:events:orders", + int maxlenApprox = 10_000, + long claimMinIdleMs = 15_000) + { + _db = db ?? throw new ArgumentNullException(nameof(db)); + _streamKey = streamKey; + _maxlenApprox = maxlenApprox; + _claimMinIdleMs = claimMinIdleMs; + } + + public string StreamKey => _streamKey; + public int MaxlenApprox => _maxlenApprox; + public long ClaimMinIdleMs => _claimMinIdleMs; + + // ------------------------------------------------------------------ + // Producer + // ------------------------------------------------------------------ + + /// Append a single event. Returns the stream ID Redis assigned. + public string Produce(string eventType, IDictionary payload) + { + return ProduceBatch(new[] { (eventType, payload) })[0]; + } + + /// + /// Pipeline several XADD calls in one round trip. + /// Each entry carries an approximate MAXLEN cap. The "~" flavour + /// lets Redis trim at a macro-node boundary, which is much cheaper + /// than exact trimming and is the right call for a retention + /// guardrail rather than a hard size limit. + /// + public string[] ProduceBatch(IEnumerable<(string EventType, IDictionary Payload)> events) + { + var eventList = events.ToList(); + if (eventList.Count == 0) + { + return Array.Empty(); + } + + var batch = _db.CreateBatch(); + var addTasks = new Task[eventList.Count]; + for (var i = 0; i < eventList.Count; i++) + { + var (eventType, payload) = eventList[i]; + var pairs = EncodeFields(eventType, payload); + addTasks[i] = batch.StreamAddAsync( + _streamKey, + pairs, + messageId: null, + maxLength: _maxlenApprox, + useApproximateMaxLength: true); + } + batch.Execute(); + Task.WaitAll(addTasks); + + var ids = new string[addTasks.Length]; + for (var i = 0; i < addTasks.Length; i++) + { + ids[i] = (string)addTasks[i].Result!; + } + Interlocked.Add(ref _producedTotal, ids.Length); + return ids; + } + + private static NameValueEntry[] EncodeFields(string eventType, IDictionary payload) + { + var entries = new List(payload.Count + 2) + { + new("type", eventType), + new("ts_ms", DateTimeOffset.UtcNow.ToUnixTimeMilliseconds().ToString()), + }; + foreach (var kv in payload) + { + entries.Add(new NameValueEntry(kv.Key, kv.Value ?? "")); + } + return entries.ToArray(); + } + + // ------------------------------------------------------------------ + // Consumer groups + // ------------------------------------------------------------------ + + /// + /// Create the consumer group if it doesn't exist. + /// "$" means "deliver only events appended after this point"; + /// pass "0-0" to replay the entire stream into a fresh group. + /// + public void EnsureGroup(string group, string startId = "$") + { + try + { + _db.StreamCreateConsumerGroup( + _streamKey, + group, + position: startId, + createStream: true); + } + catch (RedisServerException ex) when (ex.Message.Contains("BUSYGROUP", StringComparison.Ordinal)) + { + // Group already exists — nothing to do. + } + } + + public bool DeleteGroup(string group) + { + try + { + return _db.StreamDeleteConsumerGroup(_streamKey, group); + } + catch (RedisServerException) + { + return false; + } + } + + /// + /// Read new entries for this consumer via XREADGROUP. + /// The ">" ID means "deliver entries this consumer group has not + /// delivered to anyone yet" — that is the at-least-once path. + /// Replaying an explicit ID instead would re-deliver an entry that + /// is already in this consumer's pending list (see + /// for that recovery path). + /// Note: StackExchange.Redis' XREADGROUP wrapper is non-blocking, + /// so a real consumer loop polls on a short interval. + /// + public List Consume(string group, string consumer, int count = 10) + { + var entries = _db.StreamReadGroup( + _streamKey, + group, + consumer, + position: ">", + count: count); + return ToRecords(entries); + } + + /// + /// Re-deliver entries already in this consumer's PEL. + /// Reading with an explicit ID ("0" here) instead of ">" replays the + /// entries already assigned to this consumer name without advancing + /// the group's last-delivered-id. This is the canonical recovery + /// path after a crash on the same consumer name, and is also how a + /// consumer picks up entries that another consumer (or XAUTOCLAIM) + /// handed to it. + /// + public List ConsumeOwnPel(string group, string consumer, int count = 10) + { + var entries = _db.StreamReadGroup( + _streamKey, + group, + consumer, + position: "0", + count: count); + return ToRecords(entries); + } + + public long Ack(string group, IEnumerable ids) + { + var idArray = ids.Select(id => (RedisValue)id).ToArray(); + if (idArray.Length == 0) + { + return 0; + } + var n = _db.StreamAcknowledge(_streamKey, group, idArray); + Interlocked.Add(ref _ackedTotal, n); + return n; + } + + /// + /// Sweep idle pending entries to . + /// A single XAUTOCLAIM call scans up to + /// PEL entries starting at and returns a + /// continuation cursor. For a full sweep of the PEL, loop until the + /// cursor returns to "0-0" (or hit as a + /// safety net so a very large PEL can't monopolise the call). + /// + /// are PEL entries whose + /// stream payload had already been trimmed by the time this sweep + /// ran (typically because MAXLEN ~ retention outran a slow consumer). + /// XAUTOCLAIM removes those dangling slots from the PEL itself — the + /// caller does not need to XACK them — but they cannot be + /// retried, so log and route them to a dead-letter store for + /// observability. + /// + public AutoClaimResult Autoclaim( + string group, + string consumer, + int pageCount = 100, + string startId = "0-0", + int maxPages = 10) + { + var claimedAll = new List(); + var deletedAll = new List(); + var cursor = startId; + for (var i = 0; i < maxPages; i++) + { + var result = _db.StreamAutoClaim( + _streamKey, + group, + consumer, + minIdleTimeInMs: _claimMinIdleMs, + startAtId: cursor, + count: pageCount); + if (result.IsNull) + { + break; + } + foreach (var entry in result.ClaimedEntries) + { + claimedAll.Add(EntryToRecord(entry)); + } + if (result.DeletedIds != null) + { + foreach (var id in result.DeletedIds) + { + deletedAll.Add((string)id!); + } + } + var nextId = (string)result.NextStartId!; + if (nextId == "0-0") + { + break; + } + cursor = nextId; + } + Interlocked.Add(ref _claimedTotal, claimedAll.Count); + return new AutoClaimResult(claimedAll, deletedAll); + } + + /// + /// Drop a consumer from a group. + /// XGROUP DELCONSUMER destroys this consumer's PEL entries — any + /// entry it still owned is no longer tracked anywhere in the group, + /// and XAUTOCLAIM will never find it again. Always + /// (or XCLAIM it manually) to a + /// healthy consumer first; this method is the raw destructive call + /// and is exposed only for explicit cleanup. + /// + public long DeleteConsumer(string group, string consumer) + { + try + { + return _db.StreamDeleteConsumer(_streamKey, group, consumer); + } + catch (RedisServerException) + { + return 0; + } + } + + /// + /// Move every PEL entry owned by to + /// . + /// Enumerates the source consumer's PEL with XPENDING ... CONSUMER + /// and reassigns each ID with XCLAIM at zero idle time so the move + /// is unconditional. (XAUTOCLAIM does not filter by source consumer, + /// so it cannot be used for a per-consumer handover.) Call this + /// before whenever the source still has + /// pending entries — otherwise XGROUP DELCONSUMER would silently + /// destroy them and they could never be recovered. + /// + public int HandoverPending( + string group, + string fromConsumer, + string toConsumer, + int batch = 100) + { + var totalClaimed = 0; + while (true) + { + // Errors from XPENDING / XCLAIM propagate up to the caller. + // Swallowing them here and returning a partial count would + // let the caller think the handover succeeded; the caller's + // next step is XGROUP DELCONSUMER, which would destroy + // whatever entries were left in the source's PEL. + var rows = _db.StreamPendingMessages( + _streamKey, + group, + batch, + consumerName: fromConsumer); + if (rows.Length == 0) + { + break; + } + var ids = rows.Select(r => r.MessageId).ToArray(); + var claimed = _db.StreamClaim( + _streamKey, + group, + toConsumer, + minIdleTimeInMs: 0, + messageIds: ids); + totalClaimed += claimed.Length; + if (rows.Length < batch) + { + break; + } + } + Interlocked.Add(ref _claimedTotal, totalClaimed); + return totalClaimed; + } + + // ------------------------------------------------------------------ + // Replay, length, trim + // ------------------------------------------------------------------ + + /// + /// Range read with XRANGE for replay or audit. + /// Read-only: ranges do not update any group cursor and do not ack + /// anything. Useful for bootstrapping a new projection, for + /// building an audit view, or for debugging what actually went + /// through the stream. + /// + public List Replay(string startId = "-", string endId = "+", int count = 100) + { + var entries = _db.StreamRange( + _streamKey, + minId: startId, + maxId: endId, + count: count); + return entries.Select(EntryToRecord).ToList(); + } + + /// + /// Newest N entries via XREVRANGE — handy for a tail-style view. + /// + public List Tail(int count = 10) + { + var entries = _db.StreamRange( + _streamKey, + minId: "-", + maxId: "+", + count: count, + messageOrder: Order.Descending); + return entries.Select(EntryToRecord).ToList(); + } + + public long Length() + { + try + { + return _db.StreamLength(_streamKey); + } + catch (RedisServerException) + { + return 0; + } + } + + public long TrimMaxlen(int maxlen) + { + return _db.StreamTrim(_streamKey, maxlen, useApproximateMaxLength: true); + } + + public long TrimMinid(string minid) + { + // StreamTrim only exposes the MAXLEN variant; reach for the raw + // command for MINID. + var raw = _db.Execute("XTRIM", _streamKey, "MINID", "~", minid); + return (long)raw; + } + + // ------------------------------------------------------------------ + // Inspection + // ------------------------------------------------------------------ + + public Dictionary InfoStream() + { + try + { + var info = _db.StreamInfo(_streamKey); + return new Dictionary + { + ["length"] = info.Length, + ["last_generated_id"] = (string?)info.LastGeneratedId, + ["first_entry_id"] = info.FirstEntry.IsNull ? null : (string?)info.FirstEntry.Id, + ["last_entry_id"] = info.LastEntry.IsNull ? null : (string?)info.LastEntry.Id, + }; + } + catch (RedisServerException) + { + return new Dictionary + { + ["length"] = 0L, + ["last_generated_id"] = null, + ["first_entry_id"] = null, + ["last_entry_id"] = null, + }; + } + } + + public List> InfoGroups() + { + try + { + var groups = _db.StreamGroupInfo(_streamKey); + return groups.Select(g => new Dictionary + { + ["name"] = (string)g.Name!, + ["consumers"] = (long)g.ConsumerCount, + ["pending"] = (long)g.PendingMessageCount, + ["last_delivered_id"] = (string?)g.LastDeliveredId, + ["lag"] = g.Lag, + }).ToList(); + } + catch (RedisServerException) + { + return new List>(); + } + } + + public List> InfoConsumers(string group) + { + try + { + var consumers = _db.StreamConsumerInfo(_streamKey, group); + return consumers.Select(c => new Dictionary + { + ["name"] = (string)c.Name!, + ["pending"] = (long)c.PendingMessageCount, + ["idle_ms"] = c.IdleTimeInMilliseconds, + }).ToList(); + } + catch (RedisServerException) + { + return new List>(); + } + } + + /// Per-entry PEL view (id, consumer, idle, deliveries). + public List PendingDetail(string group, int count = 20) + { + try + { + var rows = _db.StreamPendingMessages(_streamKey, group, count, consumerName: RedisValue.Null); + return rows.Select(r => new PendingEntry( + (string)r.MessageId!, + (string)r.ConsumerName!, + r.IdleTimeInMilliseconds, + r.DeliveryCount)).ToList(); + } + catch (RedisServerException) + { + return new List(); + } + } + + public Dictionary Stats() + { + return new Dictionary + { + ["produced_total"] = Interlocked.Read(ref _producedTotal), + ["acked_total"] = Interlocked.Read(ref _ackedTotal), + ["claimed_total"] = Interlocked.Read(ref _claimedTotal), + }; + } + + public void ResetStats() + { + Interlocked.Exchange(ref _producedTotal, 0); + Interlocked.Exchange(ref _ackedTotal, 0); + Interlocked.Exchange(ref _claimedTotal, 0); + } + + // ------------------------------------------------------------------ + // Demo housekeeping + // ------------------------------------------------------------------ + + /// Drop the stream key entirely. Used by the demo's reset path. + public void DeleteStream() + { + _db.KeyDelete(_streamKey); + } + + // ------------------------------------------------------------------ + // Helpers + // ------------------------------------------------------------------ + + private static List ToRecords(StreamEntry[] entries) + { + var list = new List(entries.Length); + foreach (var entry in entries) + { + list.Add(EntryToRecord(entry)); + } + return list; + } + + private static StreamRecord EntryToRecord(StreamEntry entry) + { + var fields = new Dictionary(entry.Values.Length, StringComparer.Ordinal); + foreach (var value in entry.Values) + { + fields[(string)value.Name!] = (string?)value.Value ?? ""; + } + return new StreamRecord((string)entry.Id!, fields); + } +} diff --git a/content/develop/use-cases/streaming/dotnet/Program.cs b/content/develop/use-cases/streaming/dotnet/Program.cs new file mode 100644 index 0000000000..28fcf6e058 --- /dev/null +++ b/content/develop/use-cases/streaming/dotnet/Program.cs @@ -0,0 +1,992 @@ +using System.Text.Json; +using StackExchange.Redis; +using StreamingDemo; + +// .NET grows its ThreadPool gradually, which can starve polling threads +// in the consumer workers when many groups/consumers run concurrently. +// Raising the floor up front keeps the demo's behaviour clean. A +// production helper would more naturally be async (using StreamAddAsync, +// StreamRangeAsync, ScriptEvaluateAsync, etc.) and avoid this entirely. +ThreadPool.SetMinThreads(64, 64); + +var port = 8785; +var redisHost = "localhost"; +var redisPort = 6379; +var streamKey = "demo:events:orders"; +var maxlen = 2000; +var claimIdleMs = 5000L; +var resetOnStart = true; + +for (var i = 0; i < args.Length; i++) +{ + switch (args[i]) + { + case "--port" when i + 1 < args.Length: port = int.Parse(args[++i]); break; + case "--redis-host" when i + 1 < args.Length: redisHost = args[++i]; break; + case "--redis-port" when i + 1 < args.Length: redisPort = int.Parse(args[++i]); break; + case "--stream-key" when i + 1 < args.Length: streamKey = args[++i]; break; + case "--maxlen" when i + 1 < args.Length: maxlen = int.Parse(args[++i]); break; + case "--claim-idle-ms" when i + 1 < args.Length: claimIdleMs = long.Parse(args[++i]); break; + case "--no-reset": resetOnStart = false; break; + } +} + +port = int.TryParse(Environment.GetEnvironmentVariable("PORT"), out var envPort) ? envPort : port; +redisHost = Environment.GetEnvironmentVariable("REDIS_HOST") ?? redisHost; +redisPort = int.TryParse(Environment.GetEnvironmentVariable("REDIS_PORT"), out var envRedisPort) + ? envRedisPort + : redisPort; + +ConnectionMultiplexer redis; +try +{ + redis = ConnectionMultiplexer.Connect($"{redisHost}:{redisPort}"); + redis.GetDatabase().Ping(); +} +catch (Exception ex) +{ + Console.Error.WriteLine($"Failed to connect to Redis at {redisHost}:{redisPort}: {ex.Message}"); + return 1; +} + +var stream = new EventStream( + redis.GetDatabase(), + streamKey: streamKey, + maxlenApprox: maxlen, + claimMinIdleMs: claimIdleMs); + +var defaultGroups = new Dictionary +{ + ["notifications"] = new[] { "worker-a", "worker-b" }, + ["analytics"] = new[] { "worker-c" }, +}; + +var demo = new StreamingDemoState(stream); +if (resetOnStart) +{ + Console.WriteLine( + $"Deleting any existing data at key '{streamKey}' " + + "for a clean demo run (pass --no-reset to keep it)."); + stream.DeleteStream(); +} +demo.Seed(defaultGroups); + +var builder = WebApplication.CreateBuilder(); +builder.WebHost.UseUrls($"http://0.0.0.0:{port}"); +builder.Logging.SetMinimumLevel(LogLevel.Warning); +var app = builder.Build(); + +var jsonOptions = new JsonSerializerOptions +{ + PropertyNamingPolicy = JsonNamingPolicy.SnakeCaseLower, +}; + +app.MapGet("/", () => Results.Content( + HtmlPage.Generate(stream.StreamKey, stream.MaxlenApprox, stream.ClaimMinIdleMs), + "text/html")); + +app.MapGet("/state", () => Results.Json(BuildState(), jsonOptions)); + +app.MapGet("/replay", (string? start, string? end, int? count) => +{ + var startId = string.IsNullOrEmpty(start) ? "-" : start; + var endId = string.IsNullOrEmpty(end) ? "+" : end; + var limit = Math.Max(1, Math.Min(500, count ?? 20)); + var entries = stream.Replay(startId, endId, count: limit); + return Results.Json(new Dictionary + { + ["start"] = startId, + ["end"] = endId, + ["limit"] = limit, + ["entries"] = entries.Select(e => new Dictionary + { + ["id"] = e.Id, + ["fields"] = e.Fields, + }).ToList(), + }, jsonOptions); +}); + +var eventTypes = new[] { "order.placed", "order.paid", "order.shipped", "order.cancelled" }; + +app.MapPost("/produce", async (HttpContext ctx) => +{ + var form = await ctx.Request.ReadFormAsync(); + var count = Math.Max(1, Math.Min(500, int.TryParse(form["count"], out var c) ? c : 1)); + var rawType = (form["type"].ToString() ?? "").Trim(); + var events = new List<(string, IDictionary)>(count); + for (var i = 0; i < count; i++) + { + var picked = string.IsNullOrEmpty(rawType) + ? eventTypes[Random.Shared.Next(eventTypes.Length)] + : rawType; + events.Add((picked, FakePayload())); + } + var ids = stream.ProduceBatch(events); + return Results.Json(new Dictionary + { + ["produced"] = ids.Length, + ["ids"] = ids, + }, jsonOptions); +}); + +app.MapPost("/add-worker", async (HttpContext ctx) => +{ + var form = await ctx.Request.ReadFormAsync(); + var group = (form["group"].ToString() ?? "").Trim(); + var name = (form["name"].ToString() ?? "").Trim(); + if (string.IsNullOrEmpty(group) || string.IsNullOrEmpty(name)) + { + return Results.Json(new { error = "group and name are required" }, jsonOptions, statusCode: 400); + } + var added = demo.AddWorker(group, name); + if (!added) + { + return Results.Json(new { error = $"{group}/{name} already exists" }, jsonOptions, statusCode: 409); + } + return Results.Json(new { group, name }, jsonOptions); +}); + +app.MapPost("/remove-worker", async (HttpContext ctx) => +{ + var form = await ctx.Request.ReadFormAsync(); + var group = (form["group"].ToString() ?? "").Trim(); + var name = (form["name"].ToString() ?? "").Trim(); + var result = demo.RemoveWorker(group, name); + var status = result.Removed || result.Reason == "not-found" ? 200 : 409; + return Results.Json(new Dictionary + { + ["removed"] = result.Removed, + ["reason"] = result.Reason, + ["message"] = result.Message, + ["handed_over_to"] = result.HandedOverTo, + ["handed_over_count"] = result.HandedOverCount, + }, jsonOptions, statusCode: status); +}); + +app.MapPost("/crash", async (HttpContext ctx) => +{ + var form = await ctx.Request.ReadFormAsync(); + var group = (form["group"].ToString() ?? "").Trim(); + var name = (form["name"].ToString() ?? "").Trim(); + var count = int.TryParse(form["count"], out var c) ? c : 1; + var worker = demo.GetWorker(group, name); + if (worker is null) + { + return Results.Json(new { error = $"unknown consumer {group}/{name}" }, jsonOptions, statusCode: 404); + } + worker.CrashNext(count); + return Results.Json(new { queued = count }, jsonOptions); +}); + +app.MapPost("/autoclaim", async (HttpContext ctx) => +{ + // Have the chosen consumer reap stuck PEL entries into itself. + // This is the textbook XAUTOCLAIM recovery flow: each consumer + // periodically calls XAUTOCLAIM with itself as the target, then + // processes whatever was returned. The demo exposes it as a manual + // button so you can trigger the reap on a chosen consumer after + // waiting for the idle threshold. + var form = await ctx.Request.ReadFormAsync(); + var group = (form["group"].ToString() ?? "").Trim(); + var consumer = (form["consumer"].ToString() ?? "").Trim(); + if (string.IsNullOrEmpty(group) || string.IsNullOrEmpty(consumer)) + { + return Results.Json(new { error = "group and consumer are required" }, jsonOptions, statusCode: 400); + } + var worker = demo.GetWorker(group, consumer); + if (worker is null) + { + return Results.Json(new { error = $"unknown consumer {group}/{consumer}" }, jsonOptions, statusCode: 404); + } + // ReapIdlePel runs XAUTOCLAIM(self) + process + ack. deleted are + // PEL entries whose stream payload was already trimmed by MAXLEN ~ + // before the sweep ran. Redis 7+ removes them from the PEL inside + // XAUTOCLAIM itself, so the caller doesn't have to XACK them; in + // production they would be routed to a dead-letter store for + // offline inspection. + var result = worker.ReapIdlePel(); + return Results.Json(new Dictionary + { + ["claimed"] = result.Claimed, + ["processed"] = result.Processed, + ["deleted"] = result.DeletedIds, + ["min_idle_ms"] = stream.ClaimMinIdleMs, + }, jsonOptions); +}); + +app.MapPost("/trim", async (HttpContext ctx) => +{ + var form = await ctx.Request.ReadFormAsync(); + var maxlenForm = int.TryParse(form["maxlen"], out var m) ? m : 0; + var deleted = stream.TrimMaxlen(maxlenForm); + return Results.Json(new { deleted, maxlen = maxlenForm }, jsonOptions); +}); + +app.MapPost("/reset", () => +{ + var count = demo.Reset(defaultGroups); + return Results.Json(new { consumers = count }, jsonOptions); +}); + +Console.WriteLine($"Redis streaming demo server listening on http://localhost:{port}"); +Console.WriteLine( + $"Using Redis at {redisHost}:{redisPort} with stream key '{streamKey}' " + + $"(MAXLEN ~ {maxlen})"); +Console.WriteLine($"Seeded {defaultGroups.Sum(g => g.Value.Length)} consumer(s) across {defaultGroups.Count} group(s)"); + +AppDomain.CurrentDomain.ProcessExit += (_, _) => demo.StopAll(); + +app.Run(); +return 0; + +Dictionary BuildState() +{ + var streamInfo = stream.InfoStream(); + var groups = stream.InfoGroups(); + var workersSnapshot = demo.WorkersSnapshot(); + var groupsDetail = new List>(); + var pendingRows = new List>(); + + foreach (var group in groups) + { + var groupName = (string)group["name"]!; + var consumerInfo = stream.InfoConsumers(groupName) + .ToDictionary(c => (string)c["name"]!, c => c); + var consumersDetail = new List>(); + foreach (var (key, worker) in workersSnapshot) + { + if (key.Group != groupName) + { + continue; + } + consumerInfo.TryGetValue(key.Name, out var info); + var status = worker.Status(); + var combined = new Dictionary(status); + combined["pending"] = info is not null ? (long)info["pending"]! : 0L; + combined["idle_ms"] = info is not null ? (long)info["idle_ms"]! : 0L; + combined["recent"] = worker.Recent().Select(a => new Dictionary + { + ["id"] = a.Id, + ["type"] = a.Type, + ["fields"] = a.Fields, + ["acked"] = a.Acked, + ["note"] = a.Note, + }).ToList(); + consumersDetail.Add(combined); + } + // Also include consumers that exist in Redis but not in our + // in-process registry (e.g. orphaned after a restart). + foreach (var (consumerName, info) in consumerInfo) + { + if (consumersDetail.Any(c => (string)c["name"]! == consumerName)) + { + continue; + } + consumersDetail.Add(new Dictionary + { + ["name"] = consumerName, + ["group"] = groupName, + ["processed"] = 0, + ["reaped"] = 0, + ["crashed_drops"] = 0, + ["paused"] = false, + ["crash_queued"] = 0, + ["alive"] = false, + ["pending"] = (long)info["pending"]!, + ["idle_ms"] = (long)info["idle_ms"]!, + ["recent"] = new List(), + }); + } + consumersDetail.Sort((a, b) => string.Compare((string)a["name"]!, (string)b["name"]!, StringComparison.Ordinal)); + var combinedGroup = new Dictionary(group) + { + ["consumers_detail"] = consumersDetail, + }; + groupsDetail.Add(combinedGroup); + + foreach (var pending in stream.PendingDetail(groupName, count: 50)) + { + pendingRows.Add(new Dictionary + { + ["id"] = pending.Id, + ["consumer"] = pending.Consumer, + ["idle_ms"] = pending.IdleMs, + ["deliveries"] = pending.Deliveries, + ["group"] = groupName, + }); + } + } + + var tail = stream.Tail(count: 10).Select(e => new Dictionary + { + ["id"] = e.Id, + ["fields"] = e.Fields, + }).ToList(); + + return new Dictionary + { + ["stream"] = streamInfo, + ["tail"] = tail, + ["groups"] = groupsDetail, + ["pending"] = pendingRows, + ["stats"] = stream.Stats(), + }; +} + +static IDictionary FakePayload() +{ + var customers = new[] { "alice", "bob", "carol", "dan", "erin" }; + return new Dictionary + { + ["order_id"] = $"o-{Random.Shared.Next(1000, 9999)}", + ["customer"] = customers[Random.Shared.Next(customers.Length)], + ["amount"] = (5.0 + Random.Shared.NextDouble() * 245.0).ToString("F2"), + }; +} + +/// +/// In-memory registry of consumer workers across all groups. +/// +sealed class StreamingDemoState +{ + private readonly EventStream _stream; + private readonly Dictionary<(string Group, string Name), ConsumerWorker> _workers = new(); + private readonly object _lock = new(); + + public StreamingDemoState(EventStream stream) + { + _stream = stream; + } + + public int Seed(IDictionary groups) + { + lock (_lock) + { + foreach (var (group, names) in groups) + { + _stream.EnsureGroup(group, startId: "0-0"); + foreach (var name in names) + { + AddWorkerLocked(group, name); + } + } + return groups.Sum(g => g.Value.Length); + } + } + + public bool AddWorker(string group, string name) + { + lock (_lock) + { + return AddWorkerLocked(group, name); + } + } + + private bool AddWorkerLocked(string group, string name) + { + var key = (group, name); + if (_workers.ContainsKey(key)) + { + return false; + } + _stream.EnsureGroup(group, startId: "0-0"); + var worker = new ConsumerWorker(_stream, group, name); + worker.Start(); + _workers[key] = worker; + return true; + } + + public RemoveResult RemoveWorker(string group, string name) + { + ConsumerWorker? worker; + string? handoverTarget; + lock (_lock) + { + var key = (group, name); + if (!_workers.TryGetValue(key, out worker)) + { + return new RemoveResult(false, "not-found", null, null, 0); + } + // XGROUP DELCONSUMER destroys the consumer's PEL entries + // outright, so any pending message it still owned would + // become unreachable. Before deleting, hand its PEL off to + // another consumer in the same group with XCLAIM. Without + // a peer consumer to take over, refuse to delete. + handoverTarget = _workers.Keys + .Where(k => k.Group == group && k.Name != name) + .Select(k => k.Name) + .FirstOrDefault(); + if (handoverTarget is null) + { + return new RemoveResult( + false, + "no-peer", + $"{group}/{name} still owns pending entries and is the only consumer in its group; " + + "add another consumer first so its PEL can be handed over before deletion.", + null, + 0); + } + } + + // Run the handover BEFORE removing the worker from the registry. + // XGROUP DELCONSUMER would destroy the source's pending list, so + // any handover failure must abort the removal — leaving the + // worker in place lets the user retry once the underlying Redis + // issue is resolved. The worker keeps consuming during the + // handover; XCLAIM with MIN-IDLE-TIME 0 races acks gracefully — + // anything the worker acks during the window is gone from + // XPENDING and isn't moved. + int handed; + try + { + handed = _stream.HandoverPending(group, fromConsumer: name, toConsumer: handoverTarget); + } + catch (Exception ex) + { + return new RemoveResult( + false, + "handover-failed", + $"Handover from {group}/{name} to {handoverTarget} failed before XGROUP DELCONSUMER could run: {ex.Message}. " + + $"{group}/{name} is still in the group; retry the remove or investigate the Redis error before deleting " + + "(DELCONSUMER would destroy the source consumer's pending entries).", + null, + 0); + } + + // Handover succeeded; now safe to remove from the registry, stop + // the worker, and destroy the consumer record in Redis. + lock (_lock) + { + _workers.Remove((group, name)); + } + worker.Stop(); + _stream.DeleteConsumer(group, name); + return new RemoveResult(true, null, null, handoverTarget, handed); + } + + public ConsumerWorker? GetWorker(string group, string name) + { + lock (_lock) + { + return _workers.TryGetValue((group, name), out var worker) ? worker : null; + } + } + + public List> WorkersSnapshot() + { + lock (_lock) + { + return _workers.ToList(); + } + } + + public void StopAll() + { + List snapshot; + lock (_lock) + { + snapshot = _workers.Values.ToList(); + _workers.Clear(); + } + foreach (var worker in snapshot) + { + worker.Stop(); + } + } + + public int Reset(IDictionary defaultGroups) + { + StopAll(); + _stream.DeleteStream(); + _stream.ResetStats(); + return Seed(defaultGroups); + } +} + +sealed record RemoveResult( + bool Removed, + string? Reason, + string? Message, + string? HandedOverTo, + int HandedOverCount); + +static class HtmlPage +{ + public static string Generate(string streamKey, int maxlen, long claimIdleMs) + { + return Template + .Replace("__STREAM_KEY__", System.Net.WebUtility.HtmlEncode(streamKey)) + .Replace("__MAXLEN__", maxlen.ToString()) + .Replace("__CLAIM_IDLE__", claimIdleMs.ToString()); + } + + private const string Template = """"" + + + + + + Redis Streaming Demo + + + +
+
StackExchange.Redis + ASP.NET Core minimal API
+

Redis Streaming Demo

+

+ Producers append events to a single Redis Stream + (__STREAM_KEY__). Two consumer groups read the same + stream independently: notifications shares its work + across two consumers, analytics processes the full + flow on its own. Acknowledge with XACK, recover + crashed deliveries with XAUTOCLAIM, replay any range + with XRANGE, and bound retention with XTRIM. +

+ +
+
+

Stream state

+
Loading...
+ + +
+ +
+

Produce events

+

Events are appended with XADD with an approximate + MAXLEN ~ __MAXLEN__ retention cap.

+ + + + + +
+ +
+

Replay range (XRANGE)

+

Reads a slice of history. Replay is independent of any + consumer group — no cursors move, no acks happen.

+ + + + + + + +
+ +
+

Trim retention (XTRIM)

+

Cap the stream length. Approximate trimming releases whole + macro-nodes, which is much cheaper than exact trimming.

+ + + +
+ +
+

Consumer groups

+
Loading...
+
+ +
+

Pending entries (XPENDING)

+

Entries delivered to a consumer that haven't been acked yet. + Idle time ≥ __CLAIM_IDLE__ ms is eligible for + XAUTOCLAIM.

+
Loading...
+
+ + +
+
+ +
+

Last result

+

Produce events, replay a range, or trigger an autoclaim to see results.

+
+
+ +
+
+ + + + +"""""; +} diff --git a/content/develop/use-cases/streaming/dotnet/StreamingDemo.csproj b/content/develop/use-cases/streaming/dotnet/StreamingDemo.csproj new file mode 100644 index 0000000000..e04b68a5b9 --- /dev/null +++ b/content/develop/use-cases/streaming/dotnet/StreamingDemo.csproj @@ -0,0 +1,14 @@ + + + + net8.0 + enable + enable + StreamingDemo + + + + + + + diff --git a/content/develop/use-cases/streaming/dotnet/_index.md b/content/develop/use-cases/streaming/dotnet/_index.md new file mode 100644 index 0000000000..fbaea97771 --- /dev/null +++ b/content/develop/use-cases/streaming/dotnet/_index.md @@ -0,0 +1,507 @@ +--- +categories: +- docs +- develop +- stack +- oss +- rs +- rc +description: Implement a Redis event-streaming pipeline in C# with StackExchange.Redis +linkTitle: StackExchange.Redis example (C#) +title: Redis streaming with StackExchange.Redis +weight: 6 +--- + +This guide shows you how to build a Redis-backed event-streaming pipeline in C# with [StackExchange.Redis](https://stackexchange.github.io/StackExchange.Redis/). It includes a small local web server built with ASP.NET Core's minimal API so you can produce events into a single Redis Stream, watch two independent consumer groups read it at their own pace, and recover stuck deliveries with `XAUTOCLAIM` after simulating a consumer crash. + +## Overview + +A Redis Stream is an append-only log of field/value entries with auto-generated, time-ordered IDs. Producers append with [`XADD`]({{< relref "/commands/xadd" >}}); consumers belong to *consumer groups* and read with [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}). The group as a whole tracks a single `last-delivered-id` cursor, and each consumer gets its own pending-entries list (PEL) of messages it has been handed but not yet acknowledged. Once a consumer has processed an entry it calls [`XACK`]({{< relref "/commands/xack" >}}) to clear the entry from its PEL; entries left unacknowledged past an idle threshold can be reassigned to a healthy consumer with [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}). + +That gives you: + +* Ordered, durable history that many independent consumer groups can read at their own pace +* At-least-once delivery, with per-consumer pending lists and automatic recovery of crashed consumers +* Horizontal scaling within a group — add a consumer and Redis automatically splits the work +* Replay of any range with [`XRANGE`]({{< relref "/commands/xrange" >}}), independent of consumer-group state +* Bounded retention through [`XADD MAXLEN ~`]({{< relref "/commands/xadd" >}}) or + [`XTRIM MINID ~`]({{< relref "/commands/xtrim" >}}), without a separate cleanup job + +In this example, producers append order events (`order.placed`, `order.paid`, `order.shipped`, `order.cancelled`) to a single stream at `demo:events:orders`. Two consumer groups read the same stream: + +* **`notifications`** — two consumers (`worker-a`, `worker-b`) sharing the work, modelling a fan-out worker pool. +* **`analytics`** — one consumer (`worker-c`) processing the full event flow on its own. + +## How it works + +The flow looks like this: + +1. The application calls `stream.Produce(eventType, payload)` which runs [`XADD`]({{< relref "/commands/xadd" >}}) with an approximate [`MAXLEN ~`]({{< relref "/commands/xadd" >}}) cap. Redis assigns an auto-generated time-ordered ID. +2. Each consumer thread polls [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}) with the special ID `>` (meaning "deliver entries this group has not yet delivered to anyone") on a short interval. +3. After processing each entry, the consumer calls [`XACK`]({{< relref "/commands/xack" >}}) so Redis can drop it from the group's pending list. +4. If a consumer is killed (or crashes) before acking, its entries sit in the group's PEL. A periodic [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}) sweep reassigns idle entries to a healthy consumer. +5. Anyone — including code outside the consumer groups — can read history with [`XRANGE`]({{< relref "/commands/xrange" >}}) without affecting any group's cursor. + +Each consumer group has its own cursor (`last-delivered-id`) and its own pending list, so the two groups in this demo process the same events without coordinating with each other. + +## The event-stream helper + +The `EventStream` class wraps the stream operations +([source](https://github.com/redis/docs/blob/main/content/develop/use-cases/streaming/dotnet/EventStream.cs)): + +```csharp +using StackExchange.Redis; +using StreamingDemo; + +var redis = ConnectionMultiplexer.Connect("localhost:6379"); +var stream = new EventStream( + redis.GetDatabase(), + streamKey: "demo:events:orders", + maxlenApprox: 2000, // retention guardrail + claimMinIdleMs: 5000); // XAUTOCLAIM threshold + +// Producer +var streamId = stream.Produce( + "order.placed", + new Dictionary + { + ["order_id"] = "o-1234", + ["customer"] = "alice", + ["amount"] = "49.50", + }); + +// Consumer group + one consumer +stream.EnsureGroup("notifications", startId: "0-0"); +var entries = stream.Consume("notifications", "worker-a", count: 10); +foreach (var entry in entries) +{ + Handle(entry.Fields); // your processing + stream.Ack("notifications", new[] { entry.Id }); // XACK +} + +// Recover stuck PEL entries by reaping them into a healthy consumer. +// The textbook pattern: each consumer periodically calls XAUTOCLAIM +// with itself as the target and processes whatever it claimed. +// ConsumerWorker.ReapIdlePel wraps that flow; the low-level helper +// stream.Autoclaim(group, targetName) is also available if you want +// to drive XAUTOCLAIM directly. +var result = workerB.ReapIdlePel(); +// result == new ReapResult(Claimed: N, DeletedIds: [...], Processed: M) +// DeletedIds are PEL entries whose payload was already trimmed. +// Redis 7+ has already removed those slots from the PEL, so no XACK +// is needed — log them and route to a dead-letter store for audit. + +// Replay history (independent of any group's cursor) +foreach (var entry in stream.Replay("-", "+", count: 50)) +{ + Console.WriteLine($"{entry.Id} {string.Join(",", entry.Fields)}"); +} +``` + +### Data model + +Each event is a single stream entry — a flat dictionary of field/value strings — with an auto-generated time-ordered ID: + +```text +demo:events:orders + 1716998413541-0 type=order.placed order_id=o-1234 customer=alice amount=49.50 ts_ms=... + 1716998413542-0 type=order.paid order_id=o-1234 customer=alice amount=49.50 ts_ms=... + 1716998413542-1 type=order.shipped order_id=o-1235 customer=bob amount=12.00 ts_ms=... + ... +``` + +The ID is `{milliseconds}-{sequence}`, monotonically increasing within the stream, so you can range-query by approximate wall-clock time without an extra index. (IDs are ordered within a stream, not across streams — two events appended to different streams at the same millisecond can produce the same ID.) The implementation uses: + +* [`XADD ... MAXLEN ~ n`]({{< relref "/commands/xadd" >}}), pipelined through `IBatch`, for batch production with a retention cap +* [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}) with the special ID `>` for fresh deliveries to a consumer +* [`XACK`]({{< relref "/commands/xack" >}}) on every processed entry +* [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}) for sweeping idle pending entries to a healthy consumer +* [`XRANGE`]({{< relref "/commands/xrange" >}}) for replay and audit +* [`XPENDING`]({{< relref "/commands/xpending" >}}) for inspecting the per-group pending list +* [`XINFO STREAM`]({{< relref "/commands/xinfo-stream" >}}), + [`XINFO GROUPS`]({{< relref "/commands/xinfo-groups" >}}), and + [`XINFO CONSUMERS`]({{< relref "/commands/xinfo-consumers" >}}) for surface-level observability +* [`XTRIM`]({{< relref "/commands/xtrim" >}}) for explicit retention enforcement + +## Producing events + +`ProduceBatch` pipelines `XADD` calls in a single round trip through `IDatabase.CreateBatch()`. Each call carries an approximate `MAXLEN ~` cap so the stream stays bounded as it rolls forward: + +```csharp +public string[] ProduceBatch(IEnumerable<(string EventType, IDictionary Payload)> events) +{ + var eventList = events.ToList(); + var batch = _db.CreateBatch(); + var addTasks = new Task[eventList.Count]; + for (var i = 0; i < eventList.Count; i++) + { + var (eventType, payload) = eventList[i]; + var pairs = EncodeFields(eventType, payload); + addTasks[i] = batch.StreamAddAsync( + _streamKey, + pairs, + messageId: null, + maxLength: _maxlenApprox, + useApproximateMaxLength: true); + } + batch.Execute(); + Task.WaitAll(addTasks); + + var ids = addTasks.Select(t => (string)t.Result!).ToArray(); + Interlocked.Add(ref _producedTotal, ids.Length); + return ids; +} +``` + +The `~` flavour of `MAXLEN` lets Redis trim at a macro-node boundary, which is much cheaper than exact trimming and is what you want when the cap is a retention *guardrail*, not a hard size constraint. With 300 events produced and `MAXLEN ~ 50`, you might end up with 100 entries left — Redis released the oldest whole macro-node and stopped. The next `XADD` will keep length stable. + +If you genuinely need an exact cap (rare), pass `useApproximateMaxLength: false`. The performance difference is significant on busy streams. + +## Reading with a consumer group + +Each consumer in a group runs the same `XREADGROUP` poll. The special ID `>` means "deliver entries this group has not yet delivered to *anyone*": + +```csharp +public List Consume(string group, string consumer, int count = 10) +{ + var entries = _db.StreamReadGroup( + _streamKey, + group, + consumer, + position: ">", + count: count); + return ToRecords(entries); +} +``` + +StackExchange.Redis intentionally does not expose a blocking `XREADGROUP` (the long-blocking variant would monopolise the multiplexer's single command pipeline). The consumer thread polls on a short fixed interval (100 ms in the demo) so the call returns promptly when there is nothing waiting and Redis hands out the next batch as soon as producers append more entries. A production helper would naturally be `async` (`StreamReadGroupAsync` with `await Task.Delay`), which gives the same effect without parking a thread. + +Reading with an explicit ID like `0-0` instead of `>` does something different — it replays entries already delivered to *this* consumer name (its private PEL). That is the canonical recovery path when the same consumer restarts: catch up on its own pending entries first, then resume reading new ones. + +## Acknowledging entries + +Once the consumer has processed an entry, `XACK` tells Redis it can drop the entry from the group's pending list: + +```csharp +public long Ack(string group, IEnumerable ids) +{ + var idArray = ids.Select(id => (RedisValue)id).ToArray(); + if (idArray.Length == 0) return 0; + var n = _db.StreamAcknowledge(_streamKey, group, idArray); + Interlocked.Add(ref _ackedTotal, n); + return n; +} +``` + +This is the linchpin of at-least-once delivery: an entry that is never acked stays in the PEL until a claim moves it elsewhere. If your consumer thread crashes between processing and ack, the next claim sweep picks the entry back up. The one caveat is retention: `XADD MAXLEN ~` and `XTRIM` can release the entry's *payload* even while its ID is still in the PEL. The next `XAUTOCLAIM` returns those IDs in its `DeletedIds` list and removes them from the PEL inside the same command — the entry cannot be retried, so the caller should log it and route to a dead-letter store for audit. The example handles this explicitly in `/autoclaim` further down. + +The trade-off is the opposite of pub/sub: a slow or crashed consumer doesn't lose messages, but it does mean your downstream system must be idempotent. If you process an order twice because the first attempt died after the side effect but before the ack, the second attempt must be safe. + +## Multiple consumer groups, one stream + +The big difference between Redis Streams and a job queue is that any number of independent consumer groups can read the same stream. The demo sets up two groups on `demo:events:orders`: + +```csharp +stream.EnsureGroup("notifications", startId: "0-0"); +stream.EnsureGroup("analytics", startId: "0-0"); +``` + +Each group has its own cursor. Producing 5 events results in `notifications` and `analytics` each receiving all 5, with no coordination between them. Within `notifications`, the work is split across `worker-a` and `worker-b`: Redis hands each `XREADGROUP` call whatever entries are not yet delivered to anyone in the group, so adding a second worker doubles throughput without any rebalance logic. + +The `startId: "0-0"` argument means "deliver everything in the stream from the beginning" — useful in a demo and for fresh groups bootstrapping from history. In production, a brand-new group reading a long-existing stream usually starts at `$` ("only events after this point") and uses [`XRANGE`]({{< relref "/commands/xrange" >}}) explicitly if it needs history. + +## Recovering crashed consumers with XAUTOCLAIM + +The demo's "Crash next 3" button tells a chosen consumer to drop its next three deliveries on the floor without acking them — the same effect as a worker process dying mid-message. Those entries stay in the group's PEL with their delivery counter incremented. Once they have been idle for at least `claimMinIdleMs`, any healthy consumer in the group can rescue them by calling `XAUTOCLAIM` *with itself as the target*. `ConsumerWorker.ReapIdlePel` wraps that pattern: + +```csharp +public ReapResult ReapIdlePel() +{ + var swept = _stream.Autoclaim(Group, Name, pageCount: 100, maxPages: 10); + var processed = 0; + foreach (var record in swept.Claimed) + { + try + { + HandleEntry(record.Id, record.Fields); + processed++; + } + catch (Exception ex) + { + Console.Error.WriteLine($"[{Group}/{Name}] reap failed on {record.Id}: {ex.Message}"); + } + } + lock (_lock) { _reaped += processed; } + return new ReapResult(swept.Claimed.Count, swept.DeletedIds, processed); +} +``` + +The underlying `stream.Autoclaim` helper pages through the group's PEL with `XAUTOCLAIM`'s continuation cursor, using StackExchange.Redis 2.7+'s typed `StreamAutoClaim` wrapper: + +```csharp +public AutoClaimResult Autoclaim( + string group, + string consumer, + int pageCount = 100, + string startId = "0-0", + int maxPages = 10) +{ + var claimedAll = new List(); + var deletedAll = new List(); + var cursor = startId; + for (var i = 0; i < maxPages; i++) + { + var result = _db.StreamAutoClaim( + _streamKey, + group, + consumer, + minIdleTimeInMs: _claimMinIdleMs, + startAtId: cursor, + count: pageCount); + if (result.IsNull) break; + foreach (var entry in result.ClaimedEntries) + claimedAll.Add(EntryToRecord(entry)); + foreach (var id in result.DeletedIds ?? Array.Empty()) + deletedAll.Add((string)id!); + var nextId = (string)result.NextStartId!; + if (nextId == "0-0") break; + cursor = nextId; + } + Interlocked.Add(ref _claimedTotal, claimedAll.Count); + return new AutoClaimResult(claimedAll, deletedAll); +} +``` + +A single `XAUTOCLAIM` call scans up to `pageCount` PEL entries starting at `startId`, reassigns the ones idle for at least `minIdleTimeInMs` to the named consumer, and returns a continuation cursor on `StreamAutoClaimResult.NextStartId`. For a full sweep, loop until the cursor returns to `0-0` (with a `maxPages` safety net so one call cannot monopolise a very large PEL). The delivery counter is incremented on every claim — after a few cycles you can use it to spot a *poison-pill* message that crashes every consumer that touches it, and route it to a dead-letter stream so the bad entry stops cycling. (New entries keep flowing past the poison pill — `XREADGROUP >` still delivers fresh work — but the bad entry's repeated reclaim wastes consumer time and keeps the PEL larger than it needs to be.) + +The `StreamAutoClaimResult.DeletedIds` list contains PEL entry IDs whose stream payload was already trimmed by the time the claim ran (typically because `MAXLEN ~` retention outran a slow consumer). `XAUTOCLAIM` removes those dangling slots from the PEL itself, so the caller does *not* need to `XACK` them — but the entries cannot be retried either, so log and route them to a dead-letter store for offline inspection. Redis 7.0 introduced this third return element; the example requires Redis 7.0+ for that reason. + +`ReapIdlePel` is the right primitive for the recovery path because it claims and processes in one step: every entry the call returned is now in *this* consumer's PEL, so the same consumer is responsible for processing and acking it. In production each consumer thread runs `ReapIdlePel` periodically (every few seconds, on a timer) so a crashed peer's entries never sit invisibly. The demo exposes it as a manual button so you can trigger the reap after waiting for the idle threshold. + +`XCLAIM` (singular, no auto) does the same thing for a specific list of entry IDs you already have in hand — useful when you want to take ownership of one known stuck entry, or when you need to move a specific consumer's PEL to a peer (the case the demo's "Remove consumer" button handles via `HandoverPending`). `XAUTOCLAIM` cannot filter by source consumer, so it cannot be used for a per-consumer handover. + +## Replay with XRANGE + +`XRANGE` reads a slice of history. It is completely independent of any consumer group — no cursors move, no acks happen — so it is safe to call any number of times, from any process: + +```csharp +public List Replay(string startId = "-", string endId = "+", int count = 100) +{ + var entries = _db.StreamRange( + _streamKey, + minId: startId, + maxId: endId, + count: count); + return entries.Select(EntryToRecord).ToList(); +} +``` + +The special IDs `-` and `+` mean "from the very beginning" and "to the very end". You can also pass real IDs (`1716998413541-0`) or just the millisecond part (`1716998413541`, which Redis interprets as "any entry with this timestamp"). + +Typical uses: + +* **Bootstrapping a new projection** — read the entire stream from `-` and build a derived view in another store (a search index, a SQL table, a different cache). Doing this against a consumer group would consume the entries; `XRANGE` lets you do it without disrupting live consumers. +* **Auditing recent activity** — read the last few minutes by ID range without touching any group cursor. +* **Debugging** — fetch one specific entry by its ID, or a tight range around an incident timestamp, to see exactly what producers wrote. + +## The consumer worker thread + +`ConsumerWorker` wraps the `XREADGROUP` → process → `XACK` loop in a background thread +([source](https://github.com/redis/docs/blob/main/content/develop/use-cases/streaming/dotnet/ConsumerWorker.cs)): + +```csharp +private void Run() +{ + while (!_stop) + { + bool paused; + lock (_lock) { paused = _paused; } + if (paused) { Thread.Sleep(50); continue; } + + List entries; + try + { + entries = _stream.Consume(Group, Name, count: 10); + } + catch (Exception ex) + { + Console.Error.WriteLine($"[{Group}/{Name}] read failed: {ex.Message}"); + Thread.Sleep(500); + continue; + } + + if (entries.Count == 0) + { + Thread.Sleep(_pollIntervalMs); + continue; + } + + foreach (var entry in entries) + { + Dispatch(entry.Id, entry.Fields); + } + } +} +``` + +`HandleEntry` either acks (the normal path) or, when the demo has asked the worker to "crash", drops the entry on the floor and increments a counter so the UI can show what is currently in the PEL waiting to be claimed. + +Recovery of stuck PEL entries — this consumer's, after a restart, or another consumer's, after a crash — runs through a separate `ReapIdlePel` method rather than the read loop. That method calls `XAUTOCLAIM` with this consumer as the target, then processes whatever was claimed in the same flow as new entries. This is the textbook Streams pattern: each consumer is its own reaper, running `XAUTOCLAIM(self)` periodically (or on demand) so a crashed peer's entries never sit invisibly in the PEL. The demo's "XAUTOCLAIM to selected" button calls `ReapIdlePel` on the chosen consumer; in production you would run it from a timer every few seconds. + +Note that the worker's main read loop deliberately does *not* call `XREADGROUP 0` to drain its own PEL on every iteration. That would re-deliver every pending entry continuously and *reset its idle counter to zero* each time, which would keep crashed entries below the `XAUTOCLAIM` threshold forever. Using `XAUTOCLAIM(self)` as the recovery primitive — which only fires for entries idle longer than `minIdleTime` — avoids that whole class of bug. + +The pause and crash levers exist only for the demo. A real consumer is just the read-process-ack loop — everything else in this class is instrumentation. + +## Prerequisites + +* Redis 7.0 or later. `XAUTOCLAIM` was added in Redis 6.2, but its reply gained a third + element (the list of deleted IDs) in 7.0; the example relies on that shape. +* .NET 8 SDK or later. +* The `StackExchange.Redis` NuGet package at version 2.7 or later (already declared in `StreamingDemo.csproj`). + +If your Redis server is running elsewhere, start the demo with `--redis-host` and `--redis-port`. + +## Running the demo + +### Get the source files + +The demo consists of four files. Download them from the [`dotnet` source folder](https://github.com/redis/docs/tree/main/content/develop/use-cases/streaming/dotnet) on GitHub, or grab them with `curl`: + +```bash +mkdir streaming-demo && cd streaming-demo +BASE=https://raw.githubusercontent.com/redis/docs/main/content/develop/use-cases/streaming/dotnet +curl -O $BASE/EventStream.cs +curl -O $BASE/ConsumerWorker.cs +curl -O $BASE/Program.cs +curl -O $BASE/StreamingDemo.csproj +``` + +### Start the demo server + +From that directory: + +```bash +dotnet run +``` + +You should see: + +```text +Deleting any existing data at key 'demo:events:orders' for a clean demo run (pass --no-reset to keep it). +Redis streaming demo server listening on http://localhost:8785 +Using Redis at localhost:6379 with stream key 'demo:events:orders' (MAXLEN ~ 2000) +Seeded 3 consumer(s) across 2 group(s) +``` + +By default the demo wipes the configured stream key on startup so each run starts from a clean state. Pass `--no-reset` to keep any existing data at the key (useful when re-running against the same stream to inspect prior state), or `--stream-key ` to point the demo at a different key entirely. + +Open [http://localhost:8785](http://localhost:8785) in a browser. You can: + +* **Produce** any number of events of a chosen type (or random types). Watch the stream length grow and the tail update. +* See each **consumer group**: its `last-delivered-id`, the size of its pending list, and the consumers in it. Each consumer shows its processed count, pending count, and idle time. +* **Add or remove** consumers within a group at runtime to see Redis split the work across the new shape. +* Click **Crash next 3** on a consumer to drop its next three deliveries — the same effect as a worker process dying after `XREADGROUP` but before `XACK`. Watch the **Pending entries (XPENDING)** panel fill up. +* Wait until the idle time exceeds the threshold (default 5000 ms), pick a healthy target consumer, and click **XAUTOCLAIM to selected** — the stuck entries are reassigned and the delivery counter increments. +* **Replay (XRANGE)** any range to confirm the full history is independent of consumer-group state. +* **XTRIM** with an approximate `MAXLEN` to bound retention. Note that an approximate trim only releases whole macro-nodes — `MAXLEN ~ 50` on a small stream may not delete anything; on a 300-entry stream it typically lands at around 100. +* Click **Reset demo** to drop the stream and re-seed the default groups. + +## Production usage + +### Use the async API on the request path + +The demo helper is synchronous (`StreamAdd`, `StreamReadGroup`, `StreamAutoClaim`, `Thread.Sleep`) to keep the code compact. In production, prefer the `Async` overloads — `StreamAddAsync`, `StreamReadGroupAsync`, `StreamAutoClaimAsync`, `StreamAcknowledgeAsync` — together with `await Task.Delay` so request-handling threads return to the ThreadPool while the loader is in flight. The streaming structure is identical; just propagate `await`s through the call chain. + +### ThreadPool sizing for synchronous consumers + +`Program.cs` calls `ThreadPool.SetMinThreads(64, 64)` at startup. With multiple consumer groups each running a polling thread per consumer, the default ThreadPool can grow too slowly under load and starve the polling threads. Raising the floor up front is a property of *this synchronous demo*; an async helper (see above) avoids the issue entirely because the poll naturally yields the thread between iterations. + +### Pick retention by length or by minimum ID + +The demo uses `MAXLEN ~` on every `XADD`. Two alternatives are worth considering: + +* `MINID ~ ` — keep only entries newer than an ID. If you want "the last 24 hours", compute the wall-clock cutoff and call `stream.TrimMinid("{ms}-0")`. This is the right pattern when retention is time-bounded. +* No cap on `XADD` plus a periodic `XTRIM` job — useful if your producer is hot and the per-`XADD` work has to stay minimal, or if retention rules are complex (a separate process can also factor in consumer-group lag). + +In all three cases the trimming is approximate by default. Use exact trimming (`MAXLEN n` or `MINID id` without `~`) only when you genuinely need an exact count. + +### Don't let consumer-group lag silently grow + +`XINFO GROUPS` reports each group's `lag` (entries the group has not yet read) and `pending` (entries delivered but not acked). In production, alert on either of these crossing a threshold — a steadily growing pending count usually means consumers are crashing without `XAUTOCLAIM` running, and a growing lag means consumers can't keep up with producers. + +The same applies inside a group: `XINFO CONSUMERS` reports per-consumer pending counts and idle times, so you can spot one slow consumer holding entries that the rest of the group is waiting on. + +### Make consumer logic idempotent + +`XAUTOCLAIM` can re-deliver an entry to a different consumer after a crash. If your processing has side effects (sending email, charging a card, updating a downstream store), make sure the same entry processed twice gives the same result — use an idempotency key, an upsert with conditional check, or a once-per-id guard table. Redis Streams cannot give you exactly-once semantics on its own. + +### Bound the delivery counter as a poison-pill signal + +`XPENDING` returns each entry's delivery count, incremented on every claim. If an entry has been delivered (and dropped) several times, the next consumer is unlikely to fare better. After some threshold — `deliveries >= 5`, say — route the entry to a *dead-letter stream*, ack it on the original group, and alert. New entries keep flowing past a poison pill (`XREADGROUP >` still delivers fresh work), but the bad entry's repeated reclaim wastes consumer time and keeps the PEL bigger than it needs to be — without a DLQ threshold it can also slowly trip retention/lag alerts. + +### Partition by tenant or entity for scale + +A single Redis Stream is a single key, and on a Redis Cluster a single key lives on a single shard. If your throughput exceeds what one shard can handle, partition the stream — for example by tenant ID (`events:orders:{tenant_a}`, `events:orders:{tenant_b}`) — so different tenants land on different shards. Hash-tags (`{tenant_a}`) keep all related streams on the same shard if you need to multi-stream atomically. + +Per-entity partitioning (`events:order:{order_id}`) is the canonical pattern when you treat each entity's stream as the event-sourcing log for that entity: every state change for one order goes on its own stream, which is also bounded in size by the entity's lifetime. + +### Use a separate consumer pool per group + +The demo runs every consumer in one process. In production each consumer group is usually its own deployment — its own pool of pods or VMs — so a slow projection in `analytics` cannot pull `notifications` workers off their stream. Each pod runs one consumer thread per CPU core, with `XAUTOCLAIM` either embedded in the consumer loop (every N reads, claim idle entries to self) or run by a separate reaper. + +### StackExchange.Redis does not expose a blocking XREADGROUP + +StackExchange.Redis intentionally omits the long-blocking variant of `XREADGROUP` (and `XREAD`) because a blocking command would monopolise the multiplexer's single command pipeline — every other call on the same connection would queue behind it. The idiomatic workaround for this demo is a short polling interval on the synchronous wrapper; the idiomatic production pattern is `StreamReadGroupAsync` with `await Task.Delay`, which yields the thread between iterations without ever holding the multiplexer. + +### Don't read with XREAD (no group) and then try to ack + +`XREAD` and `XREADGROUP` are different mechanisms. `XREAD` is a tail-the-log read with no consumer-group state — entries are not added to any PEL, and you cannot `XACK` them. If you want at-least-once delivery and crash recovery, you must read through a consumer group. + +`XREAD` is still useful for read-only tail clients (a UI streaming events, a debugger, a `tail -f`-style command-line tool). It's just not part of the at-least-once path. + +### Inspect the stream directly with redis-cli + +When testing or troubleshooting, inspect the stream directly to confirm the consumer state is what you expect: + +```bash +# Stream summary +redis-cli XLEN demo:events:orders +redis-cli XINFO STREAM demo:events:orders + +# Group cursors and pending counts +redis-cli XINFO GROUPS demo:events:orders + +# Consumers within a group +redis-cli XINFO CONSUMERS demo:events:orders notifications + +# Pending entries with idle time and delivery count +redis-cli XPENDING demo:events:orders notifications - + 20 + +# Tail the stream live (no consumer-group state — like tail -f) +redis-cli XREAD BLOCK 0 STREAMS demo:events:orders '$' + +# Replay a range +redis-cli XRANGE demo:events:orders - + COUNT 50 +``` + +If a group's `lag` is growing while consumers' `idle` times are short, consumers are healthy but producers are outpacing them — add more consumers. If `pending` is growing while `lag` is small, consumers are *receiving* entries but not *acking* them — either they are crashing mid-message or your acking logic has a bug. + +## Learn more + +This example uses the following Redis commands: + +* [`XADD`]({{< relref "/commands/xadd" >}}) to append an event with an approximate `MAXLEN` cap. +* [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}) to read new entries for a consumer in a group. +* [`XACK`]({{< relref "/commands/xack" >}}) to acknowledge a processed entry. +* [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}) to reassign idle pending entries to a healthy consumer. +* [`XCLAIM`]({{< relref "/commands/xclaim" >}}) to take ownership of a specific list of pending entry IDs by hand (used by `HandoverPending` to move a leaving consumer's PEL to a peer, since `XAUTOCLAIM` has no source-consumer filter). +* [`XRANGE`]({{< relref "/commands/xrange" >}}) for replay and audit, independent of consumer-group state. +* [`XPENDING`]({{< relref "/commands/xpending" >}}) to inspect the per-group pending list with idle times and delivery counts. +* [`XTRIM`]({{< relref "/commands/xtrim" >}}) for explicit retention enforcement. +* [`XGROUP CREATE`]({{< relref "/commands/xgroup-create" >}}) and + [`XGROUP DELCONSUMER`]({{< relref "/commands/xgroup-delconsumer" >}}) to manage groups and consumers. +* [`XINFO STREAM`]({{< relref "/commands/xinfo-stream" >}}), + [`XINFO GROUPS`]({{< relref "/commands/xinfo-groups" >}}), and + [`XINFO CONSUMERS`]({{< relref "/commands/xinfo-consumers" >}}) for observability. + +See the [StackExchange.Redis docs](https://stackexchange.github.io/StackExchange.Redis/) for the full client reference, and the [Streams overview]({{< relref "/develop/data-types/streams" >}}) for the deeper conceptual model — consumer groups, the PEL, claim semantics, capped streams, and the differences with Kafka partitions. diff --git a/content/develop/use-cases/streaming/go/_index.md b/content/develop/use-cases/streaming/go/_index.md new file mode 100644 index 0000000000..10b7f9b8a6 --- /dev/null +++ b/content/develop/use-cases/streaming/go/_index.md @@ -0,0 +1,514 @@ +--- +categories: +- docs +- develop +- stack +- oss +- rs +- rc +description: Implement a Redis event-streaming pipeline in Go with go-redis +linkTitle: go-redis example (Go) +title: Redis streaming with go-redis +weight: 3 +--- + +This guide shows you how to build a Redis-backed event-streaming pipeline in Go with [`go-redis`]({{< relref "/develop/clients/go" >}}). It includes a small local web server built with Go's standard `net/http` package so you can produce events into a single Redis Stream, watch two independent consumer groups read it at their own pace, and recover stuck deliveries with `XAUTOCLAIM` after simulating a consumer crash. + +## Overview + +A Redis Stream is an append-only log of field/value entries with auto-generated, time-ordered IDs. Producers append with [`XADD`]({{< relref "/commands/xadd" >}}); consumers belong to *consumer groups* and read with [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}). The group as a whole tracks a single `last-delivered-id` cursor, and each consumer gets its own pending-entries list (PEL) of messages it has been handed but not yet acknowledged. Once a consumer has processed an entry it calls [`XACK`]({{< relref "/commands/xack" >}}) to clear the entry from its PEL; entries left unacknowledged past an idle threshold can be reassigned to a healthy consumer with [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}). + +That gives you: + +* Ordered, durable history that many independent consumer groups can read at their own pace +* At-least-once delivery, with per-consumer pending lists and automatic recovery of crashed consumers +* Horizontal scaling within a group — add a consumer and Redis automatically splits the work +* Replay of any range with [`XRANGE`]({{< relref "/commands/xrange" >}}), independent of consumer-group state +* Bounded retention through [`XADD MAXLEN ~`]({{< relref "/commands/xadd" >}}) or + [`XTRIM MINID ~`]({{< relref "/commands/xtrim" >}}), without a separate cleanup job + +In this example, producers append order events (`order.placed`, `order.paid`, `order.shipped`, `order.cancelled`) to a single stream at `demo:events:orders`. Two consumer groups read the same stream: + +* **`notifications`** — two consumers (`worker-a`, `worker-b`) sharing the work, modelling a fan-out worker pool. +* **`analytics`** — one consumer (`worker-c`) processing the full event flow on its own. + +## How it works + +The flow looks like this: + +1. The application calls `stream.Produce(ctx, eventType, payload)` which runs [`XADD`]({{< relref "/commands/xadd" >}}) with an approximate [`MAXLEN ~`]({{< relref "/commands/xadd" >}}) cap. Redis assigns an auto-generated time-ordered ID. +2. Each consumer goroutine loops on [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}) with the special ID `>` (meaning "deliver entries this group has not yet delivered to anyone") and a short block timeout. +3. After processing each entry, the consumer calls [`XACK`]({{< relref "/commands/xack" >}}) so Redis can drop it from the group's pending list. +4. If a consumer is killed (or crashes) before acking, its entries sit in the group's PEL. A periodic [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}) sweep reassigns idle entries to a healthy consumer. +5. Anyone — including code outside the consumer groups — can read history with [`XRANGE`]({{< relref "/commands/xrange" >}}) without affecting any group's cursor. + +Each consumer group has its own cursor (`last-delivered-id`) and its own pending list, so the two groups in this demo process the same events without coordinating with each other. + +## The event-stream helper + +The `EventStream` type wraps the stream operations +([source](https://github.com/redis/docs/blob/main/content/develop/use-cases/streaming/go/event_stream.go)): + +```go +package main + +import ( + "context" + + "github.com/redis/go-redis/v9" + "streaming" +) + +func main() { + client := redis.NewClient(&redis.Options{Addr: "localhost:6379"}) + stream := streaming.NewEventStream( + client, + "demo:events:orders", + 2000, // approximate MAXLEN retention guardrail + 5000, // XAUTOCLAIM idle threshold, in ms + ) + + ctx := context.Background() + + // Producer + streamID, _ := stream.Produce(ctx, "order.placed", map[string]string{ + "order_id": "o-1234", "customer": "alice", "amount": "49.50", + }) + _ = streamID + + // Consumer group + one consumer + _ = stream.EnsureGroup(ctx, "notifications", "0-0") + entries, _ := stream.Consume(ctx, "notifications", "worker-a", 10, 500) + for _, e := range entries { + handle(e.Fields) // your processing + _, _ = stream.Ack(ctx, "notifications", []string{e.ID}) // XACK + } + + // Recover stuck PEL entries by reaping them into a healthy consumer. + // The textbook pattern: each consumer periodically calls XAUTOCLAIM + // with itself as the target and processes whatever it claimed. + // `ConsumerWorker.ReapIdlePel` wraps that flow; the low-level helper + // `stream.Autoclaim(group, targetName, ...)` is also available if + // you want to drive XAUTOCLAIM directly. + result, _ := workerB.ReapIdlePel(ctx) + // result == streaming.ReapResult{Claimed: N, Processed: M, DeletedIDs: [...]} + // DeletedIDs are PEL entries whose payload was already trimmed. + // Redis 7+ has already removed those slots from the PEL, so no XACK + // is needed — log them and route to a dead-letter store for audit. + + // Replay history (independent of any group's cursor) + history, _ := stream.Replay(ctx, "-", "+", 50) + for _, e := range history { + _ = e + } +} +``` + +### Data model + +Each event is a single stream entry — a flat map of field/value strings — with an auto-generated time-ordered ID: + +```text +demo:events:orders + 1716998413541-0 type=order.placed order_id=o-1234 customer=alice amount=49.50 ts_ms=... + 1716998413542-0 type=order.paid order_id=o-1234 customer=alice amount=49.50 ts_ms=... + 1716998413542-1 type=order.shipped order_id=o-1235 customer=bob amount=12.00 ts_ms=... + ... +``` + +The ID is `{milliseconds}-{sequence}`, monotonically increasing within the stream, so you can range-query by approximate wall-clock time without an extra index. (IDs are ordered within a stream, not across streams — two events appended to different streams at the same millisecond can produce the same ID.) The implementation uses: + +* [`XADD ... MAXLEN ~ n`]({{< relref "/commands/xadd" >}}), pipelined, for batch production with a retention cap +* [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}) with the special ID `>` for fresh deliveries to a consumer +* [`XACK`]({{< relref "/commands/xack" >}}) on every processed entry +* [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}) for sweeping idle pending entries to a healthy consumer +* [`XRANGE`]({{< relref "/commands/xrange" >}}) for replay and audit +* [`XPENDING`]({{< relref "/commands/xpending" >}}) for inspecting the per-group pending list +* [`XINFO STREAM`]({{< relref "/commands/xinfo-stream" >}}), + [`XINFO GROUPS`]({{< relref "/commands/xinfo-groups" >}}), and + [`XINFO CONSUMERS`]({{< relref "/commands/xinfo-consumers" >}}) for surface-level observability +* [`XTRIM`]({{< relref "/commands/xtrim" >}}) for explicit retention enforcement + +## Producing events + +`ProduceBatch` pipelines `XADD` calls in a single round trip. Each call carries an approximate `MAXLEN ~` cap so the stream stays bounded as it rolls forward: + +```go +func (s *EventStream) ProduceBatch(ctx context.Context, events []ProducerEvent) ([]string, error) { + pipe := s.client.Pipeline() + cmds := make([]*redis.StringCmd, 0, len(events)) + for _, ev := range events { + fields := encodeFields(ev.Type, ev.Payload) + cmd := pipe.XAdd(ctx, &redis.XAddArgs{ + Stream: s.streamKey, + MaxLen: s.maxlenApprox, + Approx: true, + Values: fields, + }) + cmds = append(cmds, cmd) + } + if _, err := pipe.Exec(ctx); err != nil { + return nil, err + } + // ...collect ids from cmds and update stats... +} +``` + +The `~` flavour of `MAXLEN` lets Redis trim at a macro-node boundary, which is much cheaper than exact trimming and is what you want when the cap is a retention *guardrail*, not a hard size constraint. With 300 events produced and `MAXLEN ~ 50`, you might end up with 100 entries left — Redis released the oldest whole macro-node and stopped. The next `XADD` will keep length stable. + +If you genuinely need an exact cap (rare), drop the `Approx: true` field on `XAddArgs`. The performance difference is significant on busy streams. + +## Reading with a consumer group + +Each consumer in a group runs the same `XREADGROUP` loop. The special ID `>` means "deliver entries this group has not yet delivered to *anyone*": + +```go +func (s *EventStream) Consume(ctx context.Context, group, consumer string, count int64, blockMs int64) ([]Entry, error) { + res, err := s.client.XReadGroup(ctx, &redis.XReadGroupArgs{ + Group: group, + Consumer: consumer, + Streams: []string{s.streamKey, ">"}, + Count: count, + Block: time.Duration(blockMs) * time.Millisecond, + }).Result() + if err != nil { + if errors.Is(err, redis.Nil) { + return nil, nil + } + return nil, err + } + return flattenStreams(res), nil +} +``` + +`blockMs` makes the call efficient even when the stream is idle: the client parks on the server until either an entry arrives or the timeout expires, so consumers don't busy-loop. + +Reading with an explicit ID like `0` instead of `>` does something different — it replays entries already delivered to *this* consumer name (its private PEL). That is the canonical recovery path when the same consumer restarts: catch up on its own pending entries first, then resume reading new ones. The helper exposes that path separately as `ConsumeOwnPel`. + +## Acknowledging entries + +Once the consumer has processed an entry, `XACK` tells Redis it can drop the entry from the group's pending list: + +```go +func (s *EventStream) Ack(ctx context.Context, group string, ids []string) (int64, error) { + if len(ids) == 0 { + return 0, nil + } + return s.client.XAck(ctx, s.streamKey, group, ids...).Result() +} +``` + +This is the linchpin of at-least-once delivery: an entry that is never acked stays in the PEL until a claim moves it elsewhere. If your consumer goroutine crashes between processing and ack, the next claim sweep picks the entry back up. The one caveat is retention: `XADD MAXLEN ~` and `XTRIM` can release the entry's *payload* even while its ID is still in the PEL. The next `XAUTOCLAIM` returns those IDs in its `deleted` list and removes them from the PEL inside the same command — the entry cannot be retried, so the caller should log it and route to a dead-letter store for audit. The example handles this explicitly in `Autoclaim` further down. + +The trade-off is the opposite of pub/sub: a slow or crashed consumer doesn't lose messages, but it does mean your downstream system must be idempotent. If you process an order twice because the first attempt died after the side effect but before the ack, the second attempt must be safe. + +## Multiple consumer groups, one stream + +The big difference between Redis Streams and a job queue is that any number of independent consumer groups can read the same stream. The demo sets up two groups on `demo:events:orders`: + +```go +_ = stream.EnsureGroup(ctx, "notifications", "0-0") +_ = stream.EnsureGroup(ctx, "analytics", "0-0") +``` + +Each group has its own cursor. Producing 5 events results in `notifications` and `analytics` each receiving all 5, with no coordination between them. Within `notifications`, the work is split across `worker-a` and `worker-b`: Redis hands each `XREADGROUP` call whatever entries are not yet delivered to anyone in the group, so adding a second worker doubles throughput without any rebalance logic. + +The `"0-0"` argument means "deliver everything in the stream from the beginning" — useful in a demo and for fresh groups bootstrapping from history. In production, a brand-new group reading a long-existing stream usually starts at `$` ("only events after this point") and uses [`XRANGE`]({{< relref "/commands/xrange" >}}) explicitly if it needs history. + +## Recovering crashed consumers with XAUTOCLAIM + +The demo's "Crash next 3" button tells a chosen consumer to drop its next three deliveries on the floor without acking them — the same effect as a worker process dying mid-message. Those entries stay in the group's PEL with their delivery counter incremented. Once they have been idle for at least `claimMinIdleMs`, any healthy consumer in the group can rescue them by calling `XAUTOCLAIM` *with itself as the target*. `ConsumerWorker.ReapIdlePel` wraps that pattern: + +```go +func (w *ConsumerWorker) ReapIdlePel(ctx context.Context) (ReapResult, error) { + claimed, deleted, err := w.stream.Autoclaim(ctx, w.group, w.name, 100, "0-0", 10) + if err != nil { + return ReapResult{}, err + } + processed := 0 + for _, entry := range claimed { + if perErr := w.handleEntry(ctx, entry.ID, entry.Fields); perErr != nil { + log.Printf("[%s/%s] reap failed on %s: %v", w.group, w.name, entry.ID, perErr) + continue + } + processed++ + } + // ...update reaped counter and return... + return ReapResult{ + Claimed: len(claimed), + Processed: processed, + DeletedIDs: deleted, + }, nil +} +``` + +The underlying `stream.Autoclaim` helper pages through the group's PEL with `XAUTOCLAIM`'s continuation cursor: + +```go +func (s *EventStream) Autoclaim( + ctx context.Context, group, consumer string, + pageCount int64, startID string, maxPages int, +) ([]Entry, []string, error) { + cursor := startID + claimedAll, deletedAll := []Entry{}, []string{} + for i := 0; i < maxPages; i++ { + nextCursor, claimed, deleted, err := s.doAutoclaim(ctx, group, consumer, cursor, pageCount) + if err != nil { + return nil, nil, err + } + claimedAll = append(claimedAll, claimed...) + deletedAll = append(deletedAll, deleted...) + if nextCursor == "0-0" { + break + } + cursor = nextCursor + } + return claimedAll, deletedAll, nil +} +``` + +A single `XAUTOCLAIM` call scans up to `pageCount` PEL entries starting at `startID`, reassigns the ones idle for at least `claimMinIdleMs` to the named consumer, and returns a continuation cursor in the first slot of the reply. For a full sweep, loop until the cursor returns to `0-0` (with a `maxPages` safety net so one call cannot monopolise a very large PEL). The delivery counter is incremented on every claim — after a few cycles you can use it to spot a *poison-pill* message that crashes every consumer that touches it, and route it to a dead-letter stream so the bad entry stops cycling. (New entries keep flowing past the poison pill — `XREADGROUP >` still delivers fresh work — but the bad entry's repeated reclaim wastes consumer time and keeps the PEL larger than it needs to be.) + +The `deleted` list contains PEL entry IDs whose stream payload was already trimmed by the time the claim ran (typically because `MAXLEN ~` retention outran a slow consumer). `XAUTOCLAIM` removes those dangling slots from the PEL itself, so the caller does *not* need to `XACK` them — but the entries cannot be retried either, so log and route them to a dead-letter store for offline inspection. Redis 7.0 introduced this third return element; the example requires Redis 7.0+ for that reason. + +`go-redis` v9's typed `client.XAutoClaim` wrapper discards the third (deleted-IDs) element of the reply, so the helper issues `XAUTOCLAIM` through `client.Do(...)` and parses the raw reply itself. If you don't need the deleted-IDs list, the typed wrapper is more ergonomic. + +`ReapIdlePel` is the right primitive for the recovery path because it claims and processes in one step: every entry the call returned is now in *this* consumer's PEL, so the same consumer is responsible for processing and acking it. In production each consumer goroutine runs `ReapIdlePel` periodically (every few seconds, on a timer) so a crashed peer's entries never sit invisibly. The demo exposes it as a manual button so you can trigger the reap after waiting for the idle threshold. + +`XCLAIM` (singular, no auto) does the same thing for a specific list of entry IDs you already have in hand — useful when you want to take ownership of one known stuck entry, or when you need to move a specific consumer's PEL to a peer (the case the demo's "Remove consumer" button handles via `HandoverPending`). `XAUTOCLAIM` cannot filter by source consumer, so it cannot be used for a per-consumer handover. + +## Replay with XRANGE + +`XRANGE` reads a slice of history. It is completely independent of any consumer group — no cursors move, no acks happen — so it is safe to call any number of times, from any process: + +```go +func (s *EventStream) Replay(ctx context.Context, startID, endID string, count int64) ([]Entry, error) { + msgs, err := s.client.XRangeN(ctx, s.streamKey, startID, endID, count).Result() + if err != nil { + if errors.Is(err, redis.Nil) { + return nil, nil + } + return nil, err + } + return xMessagesToEntries(msgs), nil +} +``` + +The special IDs `-` and `+` mean "from the very beginning" and "to the very end". You can also pass real IDs (`1716998413541-0`) or just the millisecond part (`1716998413541`, which Redis interprets as "any entry with this timestamp"). + +Typical uses: + +* **Bootstrapping a new projection** — read the entire stream from `-` and build a derived view in another store (a search index, a SQL table, a different cache). Doing this against a consumer group would consume the entries; `XRANGE` lets you do it without disrupting live consumers. +* **Auditing recent activity** — read the last few minutes by ID range without touching any group cursor. +* **Debugging** — fetch one specific entry by its ID, or a tight range around an incident timestamp, to see exactly what producers wrote. + +## The consumer worker goroutine + +`ConsumerWorker` wraps the `XREADGROUP` → process → `XACK` loop in a daemon goroutine +([source](https://github.com/redis/docs/blob/main/content/develop/use-cases/streaming/go/consumer_worker.go)): + +```go +func (w *ConsumerWorker) run(ctx context.Context, done chan struct{}) { + defer close(done) + for { + if ctx.Err() != nil { + return + } + // ...park here if paused... + entries, err := w.stream.Consume(ctx, w.group, w.name, 10, 500) + if err != nil { + if ctx.Err() != nil { + return + } + log.Printf("[%s/%s] read failed: %v", w.group, w.name, err) + // ...short back-off... + continue + } + for _, entry := range entries { + if ctx.Err() != nil { + return + } + w.dispatch(ctx, entry.ID, entry.Fields) + } + } +} +``` + +`handleEntry` either acks (the normal path) or, when the demo has asked the worker to "crash", drops the entry on the floor and increments a counter so the UI can show what is currently in the PEL waiting to be claimed. + +Recovery of stuck PEL entries — this consumer's, after a restart, or another consumer's, after a crash — runs through a separate `ReapIdlePel` method rather than the read loop. That method calls `XAUTOCLAIM` with this consumer as the target, then processes whatever was claimed in the same flow as new entries. This is the textbook Streams pattern: each consumer is its own reaper, running `XAUTOCLAIM(self)` periodically (or on demand) so a crashed peer's entries never sit invisibly in the PEL. The demo's "XAUTOCLAIM to selected" button calls `ReapIdlePel` on the chosen consumer; in production you would run it from a timer every few seconds. + +Note that the worker's main read loop deliberately does *not* call `XREADGROUP 0` to drain its own PEL on every iteration. That would re-deliver every pending entry continuously and *reset its idle counter to zero* each time, which would keep crashed entries below the `XAUTOCLAIM` threshold forever. Using `XAUTOCLAIM(self)` as the recovery primitive — which only fires for entries idle longer than `claimMinIdleMs` — avoids that whole class of bug. + +The pause and crash levers exist only for the demo. A real consumer is just the read-process-ack loop — everything else in this type is instrumentation. + +A per-entry failure (typically `XACK` against Redis) must not kill the goroutine — that would silently halt this consumer while every other entry sat in its PEL waiting for `XAUTOCLAIM`. The dispatch wrapper logs the failure and continues; the entry stays unacked and the next `ReapIdlePel` call recovers it once it exceeds the idle threshold. + +## Prerequisites + +* Redis 7.0 or later. `XAUTOCLAIM` was added in Redis 6.2, but its reply gained a third + element (the list of deleted IDs) in 7.0; the example relies on that shape. +* Go 1.21 or later. +* The `go-redis` client. The included `go.mod` pins: + + ```text + require github.com/redis/go-redis/v9 v9.18.0 + ``` + +If your Redis server is running elsewhere, start the demo with `--redis-host` and `--redis-port`. + +## Running the demo + +### Get the source files + +The demo consists of five files. Download them from the [`go` source folder](https://github.com/redis/docs/tree/main/content/develop/use-cases/streaming/go) on GitHub, or grab them with `curl`: + +```bash +mkdir streaming-demo && cd streaming-demo +BASE=https://raw.githubusercontent.com/redis/docs/main/content/develop/use-cases/streaming/go +curl -O $BASE/event_stream.go +curl -O $BASE/consumer_worker.go +curl -O $BASE/demo_server.go +curl -O $BASE/go.mod +curl -O $BASE/go.sum +``` + +### Start the demo server + +The helper, consumer worker, and demo HTTP handlers all live in `package streaming`. Go's `package main` can't live in the same directory as another package, so create a tiny `main.go` shim in a subdirectory that calls into the package: + +```bash +mkdir -p cmd/demo +cat > cmd/demo/main.go <<'EOF' +package main + +import "streaming" + +func main() { streaming.RunDemoServer() } +EOF +``` + +Then build and run: + +```bash +go mod tidy +go run ./cmd/demo +``` + +You should see: + +```text +Deleting any existing data at key 'demo:events:orders' for a clean demo run (pass --no-reset to keep it). +Redis streaming demo server listening on http://127.0.0.1:8083 +Using Redis at localhost:6379 with stream key 'demo:events:orders' (MAXLEN ~ 2000) +Seeded 3 consumer(s) across 2 group(s) +``` + +By default the demo wipes the configured stream key on startup so each run starts from a clean state. Pass `--no-reset` to keep any existing data at the key (useful when re-running against the same stream to inspect prior state), or `--stream-key ` to point the demo at a different key entirely. + +Open [http://127.0.0.1:8083](http://127.0.0.1:8083) in a browser. You can: + +* **Produce** any number of events of a chosen type (or random types). Watch the stream length grow and the tail update. +* See each **consumer group**: its `last-delivered-id`, the size of its pending list, and the consumers in it. Each consumer shows its processed count, pending count, and idle time. +* **Add or remove** consumers within a group at runtime to see Redis split the work across the new shape. +* Click **Crash next 3** on a consumer to drop its next three deliveries — the same effect as a worker process dying after `XREADGROUP` but before `XACK`. Watch the **Pending entries (XPENDING)** panel fill up. +* Wait until the idle time exceeds the threshold (default 5000 ms), pick a healthy target consumer, and click **XAUTOCLAIM to selected** — the stuck entries are reassigned and the delivery counter increments. +* **Replay (XRANGE)** any range to confirm the full history is independent of consumer-group state. +* **XTRIM** with an approximate `MAXLEN` to bound retention. Note that an approximate trim only releases whole macro-nodes — `MAXLEN ~ 50` on a small stream may not delete anything; on a 300-entry stream it typically lands at around 100. +* Click **Reset demo** to drop the stream and re-seed the default groups. + +## Production usage + +### Pick retention by length or by minimum ID + +The demo uses `MAXLEN ~` on every `XADD`. Two alternatives are worth considering: + +* `MINID ~ ` — keep only entries newer than an ID. If you want "the last 24 hours", compute the wall-clock cutoff and pass `XTRIM MINID ~ -0` (the helper exposes this as `TrimMinid`). This is the right pattern when retention is time-bounded. +* No cap on `XADD` plus a periodic `XTRIM` job — useful if your producer is hot and the per-`XADD` work has to stay minimal, or if retention rules are complex (a separate process can also factor in consumer-group lag). + +In all three cases the trimming is approximate by default. Use exact trimming (`XTrimMaxLen` / `XTrimMinID` instead of the `Approx` variants) only when you genuinely need an exact count. + +### Don't let consumer-group lag silently grow + +`XINFO GROUPS` reports each group's `lag` (entries the group has not yet read) and `pending` (entries delivered but not acked). In production, alert on either of these crossing a threshold — a steadily growing pending count usually means consumers are crashing without `XAUTOCLAIM` running, and a growing lag means consumers can't keep up with producers. + +The same applies inside a group: `XINFO CONSUMERS` reports per-consumer pending counts and idle times, so you can spot one slow consumer holding entries that the rest of the group is waiting on. + +### Make consumer logic idempotent + +`XAUTOCLAIM` can re-deliver an entry to a different consumer after a crash. If your processing has side effects (sending email, charging a card, updating a downstream store), make sure the same entry processed twice gives the same result — use an idempotency key, an upsert with conditional check, or a once-per-id guard table. Redis Streams cannot give you exactly-once semantics on its own. + +### Bound the delivery counter as a poison-pill signal + +`XPENDING` returns each entry's delivery count, incremented on every claim. If an entry has been delivered (and dropped) several times, the next consumer is unlikely to fare better. After some threshold — `deliveries >= 5`, say — route the entry to a *dead-letter stream*, ack it on the original group, and alert. New entries keep flowing past a poison pill (`XREADGROUP >` still delivers fresh work), but the bad entry's repeated reclaim wastes consumer time and keeps the PEL bigger than it needs to be — without a DLQ threshold it can also slowly trip retention/lag alerts. + +### Partition by tenant or entity for scale + +A single Redis Stream is a single key, and on a Redis Cluster a single key lives on a single shard. If your throughput exceeds what one shard can handle, partition the stream — for example by tenant ID (`events:orders:{tenant_a}`, `events:orders:{tenant_b}`) — so different tenants land on different shards. Hash-tags (`{tenant_a}`) keep all related streams on the same shard if you need to multi-stream atomically. + +Per-entity partitioning (`events:order:{order_id}`) is the canonical pattern when you treat each entity's stream as the event-sourcing log for that entity: every state change for one order goes on its own stream, which is also bounded in size by the entity's lifetime. + +### Use a separate consumer pool per group + +The demo runs every consumer in one process. In production each consumer group is usually its own deployment — its own pool of pods or VMs — so a slow projection in `analytics` cannot pull `notifications` workers off their stream. Each pod runs one consumer goroutine per CPU core, with `XAUTOCLAIM` either embedded in the consumer loop (every N reads, claim idle entries to self) or run by a separate reaper. + +### Don't read with XREAD (no group) and then try to ack + +`XREAD` and `XREADGROUP` are different mechanisms. `XREAD` is a tail-the-log read with no consumer-group state — entries are not added to any PEL, and you cannot `XACK` them. If you want at-least-once delivery and crash recovery, you must read through a consumer group. + +`XREAD` is still useful for read-only tail clients (a UI streaming events, a debugger, a `tail -f`-style command-line tool). It's just not part of the at-least-once path. + +### Wire shutdown through `context.Context` + +Each consumer goroutine receives a `context.Context` that the worker's `Stop()` cancels. The demo's `RunDemoServer` traps `SIGINT`/`SIGTERM`, shuts the HTTP server down with a 5-second deadline, then stops every consumer goroutine. Wire your real service's `SIGTERM` handler to the same `context.CancelFunc` so an in-flight `XREADGROUP` block unblocks promptly and any half-processed entry stays in the PEL for the next reaper to recover. + +### go-redis XAutoClaim does not surface deleted IDs + +In `go-redis` v9, the typed `client.XAutoClaim(ctx, ...)` wrapper reads the deleted-IDs slot of the reply and discards it before returning to the caller. The same is true of `XAutoClaimJustID`. The helper in this guide uses `client.Do(ctx, "XAUTOCLAIM", ...)` and parses the raw reply itself so the deleted-IDs list is preserved. If you don't need the deleted-IDs list (for example, because you don't trim aggressively and your `MAXLEN ~` cap is comfortably larger than any consumer's worst-case PEL size), the typed wrapper is more ergonomic. + +### Inspect the stream directly with redis-cli + +When testing or troubleshooting, inspect the stream directly to confirm the consumer state is what you expect: + +```bash +# Stream summary +redis-cli XLEN demo:events:orders +redis-cli XINFO STREAM demo:events:orders + +# Group cursors and pending counts +redis-cli XINFO GROUPS demo:events:orders + +# Consumers within a group +redis-cli XINFO CONSUMERS demo:events:orders notifications + +# Pending entries with idle time and delivery count +redis-cli XPENDING demo:events:orders notifications - + 20 + +# Tail the stream live (no consumer-group state — like tail -f) +redis-cli XREAD BLOCK 0 STREAMS demo:events:orders '$' + +# Replay a range +redis-cli XRANGE demo:events:orders - + COUNT 50 +``` + +If a group's `lag` is growing while consumers' `idle` times are short, consumers are healthy but producers are outpacing them — add more consumers. If `pending` is growing while `lag` is small, consumers are *receiving* entries but not *acking* them — either they are crashing mid-message or your acking logic has a bug. + +## Learn more + +This example uses the following Redis commands: + +* [`XADD`]({{< relref "/commands/xadd" >}}) to append an event with an approximate `MAXLEN` cap. +* [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}) to read new entries for a consumer in a group. +* [`XACK`]({{< relref "/commands/xack" >}}) to acknowledge a processed entry. +* [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}) to reassign idle pending entries to a healthy consumer. +* [`XCLAIM`]({{< relref "/commands/xclaim" >}}) to take ownership of a specific list of pending entry IDs by hand (used by `HandoverPending` to move a leaving consumer's PEL to a peer, since `XAUTOCLAIM` has no source-consumer filter). +* [`XRANGE`]({{< relref "/commands/xrange" >}}) for replay and audit, independent of consumer-group state. +* [`XPENDING`]({{< relref "/commands/xpending" >}}) to inspect the per-group pending list with idle times and delivery counts. +* [`XTRIM`]({{< relref "/commands/xtrim" >}}) for explicit retention enforcement. +* [`XGROUP CREATE`]({{< relref "/commands/xgroup-create" >}}) and + [`XGROUP DELCONSUMER`]({{< relref "/commands/xgroup-delconsumer" >}}) to manage groups and consumers. +* [`XINFO STREAM`]({{< relref "/commands/xinfo-stream" >}}), + [`XINFO GROUPS`]({{< relref "/commands/xinfo-groups" >}}), and + [`XINFO CONSUMERS`]({{< relref "/commands/xinfo-consumers" >}}) for observability. + +See the [`go-redis` documentation]({{< relref "/develop/clients/go" >}}) for the full client reference, and the [Streams overview]({{< relref "/develop/data-types/streams" >}}) for the deeper conceptual model — consumer groups, the PEL, claim semantics, capped streams, and the differences with Kafka partitions. diff --git a/content/develop/use-cases/streaming/go/consumer_worker.go b/content/develop/use-cases/streaming/go/consumer_worker.go new file mode 100644 index 0000000000..1df19b8ddc --- /dev/null +++ b/content/develop/use-cases/streaming/go/consumer_worker.go @@ -0,0 +1,338 @@ +// Background consumer goroutine for a single consumer in a consumer group. +// +// Each worker owns a goroutine that loops on XREADGROUP > with a short +// block timeout and acks every entry it processes. Recovery of stuck +// PEL entries (this consumer's, or anyone else's) happens through +// ReapIdlePel(), which is the textbook Streams pattern: each consumer +// periodically (or on demand) calls XAUTOCLAIM with itself as the +// target, then processes whatever it claimed. The demo's "XAUTOCLAIM +// to selected" button is exactly that call. +// +// Two demo-only levers are wired into the loop: +// +// - Pause() parks the worker (so its pending entries age into the +// XAUTOCLAIM window without being consumed by ">" reads). +// - CrashNext(n) tells the worker to drop its next n deliveries on +// the floor without acking them — the same effect as a worker +// process dying mid-message. Those entries stay in the group's +// PEL until ReapIdlePel recovers them. +// +// Real consumers do not need either lever; they only need +// XREADGROUP → process → XACK in run() and a periodic ReapIdlePel call +// to recover stuck entries. + +package streaming + +import ( + "context" + "log" + "sync" + "time" +) + +// RecentEntry is one slot in the worker's recent-deliveries buffer. +type RecentEntry struct { + ID string `json:"id"` + Type string `json:"type"` + Fields map[string]string `json:"fields"` + Acked bool `json:"acked"` + Note string `json:"note,omitempty"` +} + +// ConsumerStatus is the JSON-friendly view of one worker's state. +type ConsumerStatus struct { + Name string `json:"name"` + Group string `json:"group"` + Processed int64 `json:"processed"` + Reaped int64 `json:"reaped"` + CrashedDrops int64 `json:"crashed_drops"` + Paused bool `json:"paused"` + CrashQueued int `json:"crash_queued"` + Alive bool `json:"alive"` +} + +// ReapResult is what ReapIdlePel returns. +type ReapResult struct { + Claimed int `json:"claimed"` + Processed int `json:"processed"` + DeletedIDs []string `json:"deleted_ids"` +} + +// ConsumerWorker is one consumer in a consumer group, running on its +// own goroutine. +type ConsumerWorker struct { + stream *EventStream + group string + name string + processLatencyMs int + recentCap int + + mu sync.Mutex + processed int64 + reaped int64 + crashedDrops int64 + crashNext int + paused bool + recent []RecentEntry + + startStopMu sync.Mutex + cancel context.CancelFunc + done chan struct{} + alive bool +} + +// NewConsumerWorker constructs a worker. processLatencyMs simulates +// per-entry processing time so the demo's UI has something to show. +func NewConsumerWorker(stream *EventStream, group, name string) *ConsumerWorker { + return &ConsumerWorker{ + stream: stream, + group: group, + name: name, + processLatencyMs: 25, + recentCap: 20, + recent: make([]RecentEntry, 0, 20), + } +} + +// Start spawns the worker goroutine if it isn't already running. +func (w *ConsumerWorker) Start() { + w.startStopMu.Lock() + defer w.startStopMu.Unlock() + if w.alive { + return + } + ctx, cancel := context.WithCancel(context.Background()) + done := make(chan struct{}) + w.cancel = cancel + w.done = done + w.alive = true + go w.run(ctx, done) +} + +// Stop signals the worker to exit and waits up to joinTimeout for the +// goroutine to finish. +func (w *ConsumerWorker) Stop(joinTimeout time.Duration) { + w.startStopMu.Lock() + if !w.alive { + w.startStopMu.Unlock() + return + } + cancel := w.cancel + done := w.done + w.startStopMu.Unlock() + + cancel() + select { + case <-done: + case <-time.After(joinTimeout): + } + w.startStopMu.Lock() + w.alive = false + w.startStopMu.Unlock() +} + +// Pause parks the read loop. New deliveries stop arriving for this +// consumer; entries already in its PEL age into the XAUTOCLAIM window. +func (w *ConsumerWorker) Pause() { + w.mu.Lock() + w.paused = true + w.mu.Unlock() +} + +// Resume re-enables the read loop. +func (w *ConsumerWorker) Resume() { + w.mu.Lock() + w.paused = false + w.mu.Unlock() +} + +// CrashNext tells the worker to drop the next count deliveries on the +// floor without acking them. The entries stay in the group's PEL with +// their delivery counter incremented, so XAUTOCLAIM can recover them +// once they exceed the idle threshold. +func (w *ConsumerWorker) CrashNext(count int) { + if count < 0 { + count = 0 + } + w.mu.Lock() + w.crashNext += count + w.mu.Unlock() +} + +// Recent returns the worker's recent-deliveries buffer (newest first). +func (w *ConsumerWorker) Recent() []RecentEntry { + w.mu.Lock() + defer w.mu.Unlock() + out := make([]RecentEntry, len(w.recent)) + copy(out, w.recent) + return out +} + +// Status returns the JSON-friendly view of the worker. +func (w *ConsumerWorker) Status() ConsumerStatus { + w.mu.Lock() + defer w.mu.Unlock() + w.startStopMu.Lock() + alive := w.alive + w.startStopMu.Unlock() + return ConsumerStatus{ + Name: w.name, + Group: w.group, + Processed: w.processed, + Reaped: w.reaped, + CrashedDrops: w.crashedDrops, + Paused: w.paused, + CrashQueued: w.crashNext, + Alive: alive, + } +} + +// ReapIdlePel runs XAUTOCLAIM into self and processes the claimed entries. +// +// Returns a summary with claimed, processed, and the list of deleted +// IDs (PEL entries whose stream payload was already trimmed before the +// sweep ran). Redis 7+ removes deleted IDs from the PEL inside +// XAUTOCLAIM itself, so the caller does not have to XACK them; they +// are reported so the caller can route them to a dead-letter store. +func (w *ConsumerWorker) ReapIdlePel(ctx context.Context) (ReapResult, error) { + claimed, deleted, err := w.stream.Autoclaim(ctx, w.group, w.name, 100, "0-0", 10) + if err != nil { + return ReapResult{}, err + } + processed := 0 + for _, entry := range claimed { + if perErr := w.handleEntry(ctx, entry.ID, entry.Fields); perErr != nil { + log.Printf("[%s/%s] reap failed on %s: %v", w.group, w.name, entry.ID, perErr) + continue + } + processed++ + } + w.mu.Lock() + w.reaped += int64(processed) + w.mu.Unlock() + return ReapResult{ + Claimed: len(claimed), + Processed: processed, + DeletedIDs: deleted, + }, nil +} + +// run is the main loop: XREADGROUP > → process → XACK. +// +// A per-entry error (typically a failed XACK) must NOT kill the +// goroutine; that would silently halt this consumer while every other +// entry it owns sat in the PEL waiting for XAUTOCLAIM. We log and +// move on; the entry stays unacked and the next reap recovers it once +// it exceeds the idle threshold. +func (w *ConsumerWorker) run(ctx context.Context, done chan struct{}) { + defer close(done) + for { + if ctx.Err() != nil { + return + } + w.mu.Lock() + paused := w.paused + w.mu.Unlock() + if paused { + select { + case <-ctx.Done(): + return + case <-time.After(50 * time.Millisecond): + } + continue + } + + entries, err := w.stream.Consume(ctx, w.group, w.name, 10, 500) + if err != nil { + if ctx.Err() != nil { + return + } + log.Printf("[%s/%s] read failed: %v", w.group, w.name, err) + select { + case <-ctx.Done(): + return + case <-time.After(500 * time.Millisecond): + } + continue + } + + for _, entry := range entries { + if ctx.Err() != nil { + return + } + w.dispatch(ctx, entry.ID, entry.Fields) + } + } +} + +func (w *ConsumerWorker) dispatch(ctx context.Context, id string, fields map[string]string) { + if w.processLatencyMs > 0 { + select { + case <-ctx.Done(): + return + case <-time.After(time.Duration(w.processLatencyMs) * time.Millisecond): + } + } + if err := w.handleEntry(ctx, id, fields); err != nil { + // Per-entry failure: log, leave unacked for the next reap, but + // keep this consumer alive so its other entries don't pile up. + log.Printf("[%s/%s] failed to handle %s: %v", w.group, w.name, id, err) + w.pushRecent(RecentEntry{ + ID: id, + Type: fields["type"], + Fields: fields, + Acked: false, + Note: "handler error: " + err.Error(), + }) + } +} + +// handleEntry is the per-entry processor. The "crash" path drops the +// entry without acking it (incrementing crashedDrops); the normal path +// XACKs it. Returns an error only if XACK itself failed — a dropped +// entry is not an error. +func (w *ConsumerWorker) handleEntry(ctx context.Context, id string, fields map[string]string) error { + w.mu.Lock() + drop := w.crashNext > 0 + if drop { + w.crashNext-- + w.crashedDrops++ + } + w.mu.Unlock() + + if drop { + w.pushRecent(RecentEntry{ + ID: id, + Type: fields["type"], + Fields: fields, + Acked: false, + Note: "dropped (simulated crash)", + }) + return nil + } + + if _, err := w.stream.Ack(ctx, w.group, []string{id}); err != nil { + return err + } + w.mu.Lock() + w.processed++ + w.mu.Unlock() + w.pushRecent(RecentEntry{ + ID: id, + Type: fields["type"], + Fields: fields, + Acked: true, + }) + return nil +} + +// pushRecent inserts an entry at the head of the recent buffer, +// trimming the tail when it exceeds capacity. +func (w *ConsumerWorker) pushRecent(entry RecentEntry) { + w.mu.Lock() + defer w.mu.Unlock() + w.recent = append([]RecentEntry{entry}, w.recent...) + if len(w.recent) > w.recentCap { + w.recent = w.recent[:w.recentCap] + } +} diff --git a/content/develop/use-cases/streaming/go/demo_server.go b/content/develop/use-cases/streaming/go/demo_server.go new file mode 100644 index 0000000000..493c1b22d2 --- /dev/null +++ b/content/develop/use-cases/streaming/go/demo_server.go @@ -0,0 +1,1167 @@ +// Redis streaming demo server. +// +// Create a main.go shim in a subdirectory (Go's package main cannot +// live in the same directory as package streaming): +// +// mkdir -p cmd/demo +// cat > cmd/demo/main.go <<'EOF' +// package main +// +// import "streaming" +// +// func main() { streaming.RunDemoServer() } +// EOF +// +// Then build and run: +// +// go mod tidy +// go run ./cmd/demo --port 8083 +// +// Visit http://localhost:8083 to watch a Redis Stream in action: +// producers append events to a single stream, two independent consumer +// groups read the same stream at their own pace, and within the +// "notifications" group two consumers share the work. Crash a consumer +// to drop deliveries mid-process, run XAUTOCLAIM to reassign the stuck +// entries, replay any ID range with XRANGE, and trim retention with +// XTRIM. +package streaming + +import ( + "context" + "encoding/json" + "errors" + "flag" + "fmt" + "log" + "math/rand" + "net/http" + "os" + "os/signal" + "sort" + "strconv" + "strings" + "sync" + "syscall" + "time" + + "github.com/redis/go-redis/v9" +) + +// EventTypes are the four order-event types the demo produces. +var EventTypes = []string{ + "order.placed", + "order.paid", + "order.shipped", + "order.cancelled", +} + +// DefaultGroups is the seed topology — group → consumer names. +var DefaultGroups = []groupSeed{ + {Name: "notifications", Consumers: []string{"worker-a", "worker-b"}}, + {Name: "analytics", Consumers: []string{"worker-c"}}, +} + +type groupSeed struct { + Name string + Consumers []string +} + +// StreamingDemo is the in-memory registry of consumer workers. +// +// http.ServeMux dispatches every HTTP request on a fresh goroutine, so +// any code that mutates workers (or iterates while another handler is +// mutating it) needs the lock. +type StreamingDemo struct { + stream *EventStream + mu sync.Mutex + workers map[workerKey]*ConsumerWorker +} + +type workerKey struct { + Group string + Name string +} + +// NewStreamingDemo constructs an empty demo around the given stream. +func NewStreamingDemo(stream *EventStream) *StreamingDemo { + return &StreamingDemo{ + stream: stream, + workers: make(map[workerKey]*ConsumerWorker), + } +} + +// Seed creates the default groups and consumer workers. Returns the +// total number of workers seeded. +func (d *StreamingDemo) Seed(ctx context.Context, groups []groupSeed) (int, error) { + d.mu.Lock() + defer d.mu.Unlock() + total := 0 + for _, g := range groups { + if err := d.stream.EnsureGroup(ctx, g.Name, "0-0"); err != nil { + return total, err + } + for _, name := range g.Consumers { + key := workerKey{Group: g.Name, Name: name} + if _, ok := d.workers[key]; ok { + continue + } + worker := NewConsumerWorker(d.stream, g.Name, name) + worker.Start() + d.workers[key] = worker + total++ + } + } + return total, nil +} + +// AddWorker adds one consumer to a group. Returns false if a consumer +// with that group+name pair already exists. +func (d *StreamingDemo) AddWorker(ctx context.Context, group, name string) (bool, error) { + d.mu.Lock() + defer d.mu.Unlock() + key := workerKey{Group: group, Name: name} + if _, ok := d.workers[key]; ok { + return false, nil + } + if err := d.stream.EnsureGroup(ctx, group, "0-0"); err != nil { + return false, err + } + worker := NewConsumerWorker(d.stream, group, name) + worker.Start() + d.workers[key] = worker + return true, nil +} + +// removeWorkerResult mirrors the Python reference's return shape. +type removeWorkerResult struct { + Removed bool `json:"removed"` + Reason string `json:"reason,omitempty"` + Message string `json:"message,omitempty"` + HandedOverTo string `json:"handed_over_to,omitempty"` + HandedOverCount int `json:"handed_over_count,omitempty"` +} + +// RemoveWorker safely removes a consumer. +// +// XGROUP DELCONSUMER destroys the consumer's PEL entries outright, so +// any pending message it still owned would become unreachable. Before +// deleting, hand its PEL off to another consumer in the same group +// with XCLAIM. Without a peer consumer to take over, refuse to delete +// and leave the worker in place so the user can add a peer first. +func (d *StreamingDemo) RemoveWorker(ctx context.Context, group, name string) (removeWorkerResult, int) { + d.mu.Lock() + key := workerKey{Group: group, Name: name} + worker, ok := d.workers[key] + if !ok { + d.mu.Unlock() + return removeWorkerResult{Removed: false, Reason: "not-found"}, http.StatusOK + } + peers := []string{} + for k := range d.workers { + if k.Group == group && k.Name != name { + peers = append(peers, k.Name) + } + } + if len(peers) == 0 { + d.mu.Unlock() + return removeWorkerResult{ + Removed: false, + Reason: "no-peer", + Message: fmt.Sprintf( + "%s/%s still owns pending entries and is the only consumer in its group; add another consumer first so its PEL can be handed over before deletion.", + group, name, + ), + }, http.StatusConflict + } + sort.Strings(peers) + handoverTo := peers[0] + d.mu.Unlock() + + // Run the handover BEFORE removing the worker from the registry. + // XGROUP DELCONSUMER would destroy the source's pending list, so + // any handover failure must abort the removal — leaving the worker + // in place lets the user retry once the underlying Redis issue is + // resolved. (The worker keeps consuming during the handover; XCLAIM + // with MIN-IDLE-TIME 0 races acks gracefully — anything the worker + // acks during the window is gone from XPENDING and isn't moved.) + claimed, err := d.stream.HandoverPending(ctx, group, name, handoverTo, 100) + if err != nil { + return removeWorkerResult{ + Removed: false, + Reason: "handover-failed", + Message: fmt.Sprintf( + "Handover from %s/%s to %s failed before XGROUP DELCONSUMER could run: %v. %s/%s is still in the group; retry the remove or investigate the Redis error before deleting (DELCONSUMER would destroy the source consumer's pending entries).", + group, name, handoverTo, err, group, name, + ), + }, http.StatusConflict + } + + // Handover succeeded; now safe to remove from the registry, stop + // the worker, and destroy the consumer record in Redis. + d.mu.Lock() + delete(d.workers, key) + d.mu.Unlock() + worker.Stop(2 * time.Second) + if _, err := d.stream.DeleteConsumer(ctx, group, name); err != nil { + log.Printf("[demo] delconsumer %s/%s: %v", group, name, err) + } + return removeWorkerResult{ + Removed: true, + HandedOverTo: handoverTo, + HandedOverCount: claimed, + }, http.StatusOK +} + +// GetWorker returns the worker for a (group, name) pair, or nil. +func (d *StreamingDemo) GetWorker(group, name string) *ConsumerWorker { + d.mu.Lock() + defer d.mu.Unlock() + return d.workers[workerKey{Group: group, Name: name}] +} + +// WorkersSnapshot returns a stable list of (key, worker) safe to use +// outside the lock. +func (d *StreamingDemo) WorkersSnapshot() []workerSnap { + d.mu.Lock() + defer d.mu.Unlock() + out := make([]workerSnap, 0, len(d.workers)) + for k, w := range d.workers { + out = append(out, workerSnap{Key: k, Worker: w}) + } + return out +} + +type workerSnap struct { + Key workerKey + Worker *ConsumerWorker +} + +// StopAll stops every worker. Used by reset and shutdown. +func (d *StreamingDemo) StopAll() { + d.mu.Lock() + workers := make([]*ConsumerWorker, 0, len(d.workers)) + for _, w := range d.workers { + workers = append(workers, w) + } + d.workers = make(map[workerKey]*ConsumerWorker) + d.mu.Unlock() + for _, w := range workers { + w.Stop(2 * time.Second) + } +} + +// Reset stops every worker, drops the stream, resets stats, and re-seeds. +func (d *StreamingDemo) Reset(ctx context.Context) (int, error) { + d.StopAll() + if err := d.stream.DeleteStream(ctx); err != nil { + return 0, err + } + d.stream.ResetStats() + return d.Seed(ctx, DefaultGroups) +} + +// ------------------------------------------------------------------ +// HTTP server +// ------------------------------------------------------------------ + +type httpServer struct { + stream *EventStream + demo *StreamingDemo +} + +func (s *httpServer) writeJSON(w http.ResponseWriter, status int, payload interface{}) { + w.Header().Set("Content-Type", "application/json") + w.WriteHeader(status) + if err := json.NewEncoder(w).Encode(payload); err != nil { + log.Printf("[demo] write json: %v", err) + } +} + +func (s *httpServer) handleRoot(w http.ResponseWriter, r *http.Request) { + if r.URL.Path != "/" && r.URL.Path != "/index.html" { + http.NotFound(w, r) + return + } + html := strings.NewReplacer( + "__STREAM_KEY__", s.stream.StreamKey(), + "__MAXLEN__", fmt.Sprintf("%d", s.stream.MaxlenApprox()), + "__CLAIM_IDLE__", fmt.Sprintf("%d", s.stream.ClaimMinIdleMs()), + ).Replace(htmlTemplate) + w.Header().Set("Content-Type", "text/html; charset=utf-8") + w.WriteHeader(http.StatusOK) + _, _ = w.Write([]byte(html)) +} + +type stateResponse struct { + Stream StreamInfo `json:"stream"` + Tail []Entry `json:"tail"` + Groups []map[string]interface{} `json:"groups"` + Pending []map[string]interface{} `json:"pending"` + Stats Stats `json:"stats"` +} + +type consumerDetail struct { + Name string `json:"name"` + Group string `json:"group"` + Processed int64 `json:"processed"` + Reaped int64 `json:"reaped"` + CrashedDrops int64 `json:"crashed_drops"` + Paused bool `json:"paused"` + CrashQueued int `json:"crash_queued"` + Alive bool `json:"alive"` + Pending int64 `json:"pending"` + IdleMs int64 `json:"idle_ms"` + Recent []RecentEntry `json:"recent"` +} + +func (s *httpServer) handleState(w http.ResponseWriter, r *http.Request) { + ctx := r.Context() + streamInfo, _ := s.stream.InfoStream(ctx) + groups, _ := s.stream.InfoGroups(ctx) + workers := s.demo.WorkersSnapshot() + + groupsDetail := make([]map[string]interface{}, 0, len(groups)) + pendingRows := make([]map[string]interface{}, 0) + + for _, group := range groups { + consumerInfos, _ := s.stream.InfoConsumers(ctx, group.Name) + byName := make(map[string]ConsumerInfo, len(consumerInfos)) + for _, ci := range consumerInfos { + byName[ci.Name] = ci + } + + consumers := make([]consumerDetail, 0) + seen := make(map[string]bool) + for _, ws := range workers { + if ws.Key.Group != group.Name { + continue + } + info := byName[ws.Key.Name] + status := ws.Worker.Status() + consumers = append(consumers, consumerDetail{ + Name: status.Name, + Group: status.Group, + Processed: status.Processed, + Reaped: status.Reaped, + CrashedDrops: status.CrashedDrops, + Paused: status.Paused, + CrashQueued: status.CrashQueued, + Alive: status.Alive, + Pending: info.Pending, + IdleMs: info.IdleMs, + Recent: ws.Worker.Recent(), + }) + seen[ws.Key.Name] = true + } + // Surface consumers Redis knows about but our registry doesn't + // (orphans after a process restart, manual XGROUP CREATECONSUMER). + for _, ci := range consumerInfos { + if seen[ci.Name] { + continue + } + consumers = append(consumers, consumerDetail{ + Name: ci.Name, + Group: group.Name, + Pending: ci.Pending, + IdleMs: ci.IdleMs, + Recent: []RecentEntry{}, + }) + } + sort.Slice(consumers, func(i, j int) bool { + return consumers[i].Name < consumers[j].Name + }) + + groupsDetail = append(groupsDetail, map[string]interface{}{ + "name": group.Name, + "consumers": group.Consumers, + "pending": group.Pending, + "last_delivered_id": group.LastDeliveredID, + "lag": group.Lag, + "consumers_detail": consumers, + }) + + pendingDetail, _ := s.stream.PendingDetail(ctx, group.Name, 50) + for _, p := range pendingDetail { + pendingRows = append(pendingRows, map[string]interface{}{ + "group": group.Name, + "id": p.ID, + "consumer": p.Consumer, + "idle_ms": p.IdleMs, + "deliveries": p.Deliveries, + }) + } + } + + tail, _ := s.stream.Tail(ctx, 10) + if tail == nil { + tail = []Entry{} + } + + s.writeJSON(w, http.StatusOK, stateResponse{ + Stream: streamInfo, + Tail: tail, + Groups: groupsDetail, + Pending: pendingRows, + Stats: s.stream.Stats(), + }) +} + +func (s *httpServer) handleProduce(w http.ResponseWriter, r *http.Request) { + if err := r.ParseForm(); err != nil { + s.writeJSON(w, http.StatusBadRequest, map[string]string{"error": "bad form"}) + return + } + count := clamp(parseIntOr(r.FormValue("count"), 1), 1, 500) + eventType := strings.TrimSpace(r.FormValue("type")) + events := make([]ProducerEvent, 0, count) + for i := 0; i < count; i++ { + picked := eventType + if picked == "" { + picked = EventTypes[rand.Intn(len(EventTypes))] + } + events = append(events, ProducerEvent{Type: picked, Payload: fakePayload()}) + } + ids, err := s.stream.ProduceBatch(r.Context(), events) + if err != nil { + s.writeJSON(w, http.StatusInternalServerError, map[string]string{"error": err.Error()}) + return + } + s.writeJSON(w, http.StatusOK, map[string]interface{}{ + "produced": len(ids), + "ids": ids, + }) +} + +func (s *httpServer) handleAddWorker(w http.ResponseWriter, r *http.Request) { + if err := r.ParseForm(); err != nil { + s.writeJSON(w, http.StatusBadRequest, map[string]string{"error": "bad form"}) + return + } + group := strings.TrimSpace(r.FormValue("group")) + name := strings.TrimSpace(r.FormValue("name")) + if group == "" || name == "" { + s.writeJSON(w, http.StatusBadRequest, map[string]string{"error": "group and name are required"}) + return + } + added, err := s.demo.AddWorker(r.Context(), group, name) + if err != nil { + s.writeJSON(w, http.StatusInternalServerError, map[string]string{"error": err.Error()}) + return + } + if !added { + s.writeJSON(w, http.StatusConflict, map[string]string{"error": fmt.Sprintf("%s/%s already exists", group, name)}) + return + } + s.writeJSON(w, http.StatusOK, map[string]string{"group": group, "name": name}) +} + +func (s *httpServer) handleRemoveWorker(w http.ResponseWriter, r *http.Request) { + if err := r.ParseForm(); err != nil { + s.writeJSON(w, http.StatusBadRequest, map[string]string{"error": "bad form"}) + return + } + group := strings.TrimSpace(r.FormValue("group")) + name := strings.TrimSpace(r.FormValue("name")) + result, status := s.demo.RemoveWorker(r.Context(), group, name) + s.writeJSON(w, status, result) +} + +func (s *httpServer) handleCrash(w http.ResponseWriter, r *http.Request) { + if err := r.ParseForm(); err != nil { + s.writeJSON(w, http.StatusBadRequest, map[string]string{"error": "bad form"}) + return + } + group := strings.TrimSpace(r.FormValue("group")) + name := strings.TrimSpace(r.FormValue("name")) + count := parseIntOr(r.FormValue("count"), 1) + worker := s.demo.GetWorker(group, name) + if worker == nil { + s.writeJSON(w, http.StatusNotFound, map[string]string{ + "error": fmt.Sprintf("unknown consumer %s/%s", group, name), + }) + return + } + worker.CrashNext(count) + s.writeJSON(w, http.StatusOK, map[string]int{"queued": count}) +} + +func (s *httpServer) handleAutoclaim(w http.ResponseWriter, r *http.Request) { + if err := r.ParseForm(); err != nil { + s.writeJSON(w, http.StatusBadRequest, map[string]string{"error": "bad form"}) + return + } + group := strings.TrimSpace(r.FormValue("group")) + consumer := strings.TrimSpace(r.FormValue("consumer")) + if group == "" || consumer == "" { + s.writeJSON(w, http.StatusBadRequest, map[string]string{"error": "group and consumer are required"}) + return + } + worker := s.demo.GetWorker(group, consumer) + if worker == nil { + s.writeJSON(w, http.StatusNotFound, map[string]string{ + "error": fmt.Sprintf("unknown consumer %s/%s", group, consumer), + }) + return + } + result, err := worker.ReapIdlePel(r.Context()) + if err != nil { + s.writeJSON(w, http.StatusInternalServerError, map[string]string{"error": err.Error()}) + return + } + deleted := result.DeletedIDs + if deleted == nil { + deleted = []string{} + } + s.writeJSON(w, http.StatusOK, map[string]interface{}{ + "claimed": result.Claimed, + "processed": result.Processed, + "deleted": deleted, + "min_idle_ms": s.stream.ClaimMinIdleMs(), + }) +} + +func (s *httpServer) handleTrim(w http.ResponseWriter, r *http.Request) { + if err := r.ParseForm(); err != nil { + s.writeJSON(w, http.StatusBadRequest, map[string]string{"error": "bad form"}) + return + } + maxlen := int64(parseIntOr(r.FormValue("maxlen"), 0)) + deleted, err := s.stream.TrimMaxlen(r.Context(), maxlen) + if err != nil { + s.writeJSON(w, http.StatusInternalServerError, map[string]string{"error": err.Error()}) + return + } + s.writeJSON(w, http.StatusOK, map[string]interface{}{ + "deleted": deleted, + "maxlen": maxlen, + }) +} + +func (s *httpServer) handleReplay(w http.ResponseWriter, r *http.Request) { + q := r.URL.Query() + start := q.Get("start") + if start == "" { + start = "-" + } + end := q.Get("end") + if end == "" { + end = "+" + } + limit := int64(clamp(parseIntOr(q.Get("count"), 20), 1, 500)) + entries, err := s.stream.Replay(r.Context(), start, end, limit) + if err != nil { + s.writeJSON(w, http.StatusInternalServerError, map[string]string{"error": err.Error()}) + return + } + if entries == nil { + entries = []Entry{} + } + s.writeJSON(w, http.StatusOK, map[string]interface{}{ + "start": start, + "end": end, + "limit": limit, + "entries": entries, + }) +} + +func (s *httpServer) handleReset(w http.ResponseWriter, r *http.Request) { + count, err := s.demo.Reset(r.Context()) + if err != nil { + s.writeJSON(w, http.StatusInternalServerError, map[string]string{"error": err.Error()}) + return + } + s.writeJSON(w, http.StatusOK, map[string]int{"consumers": count}) +} + +// ------------------------------------------------------------------ +// Entry point +// ------------------------------------------------------------------ + +// RunDemoServer parses CLI flags and starts the streaming demo HTTP +// server. It is the entry point your cmd/demo/main.go shim calls. +func RunDemoServer() { + host := flag.String("host", "127.0.0.1", "HTTP bind host") + port := flag.Int("port", 8083, "HTTP bind port") + redisHost := flag.String("redis-host", "localhost", "Redis host") + redisPort := flag.Int("redis-port", 6379, "Redis port") + streamKey := flag.String("stream-key", "demo:events:orders", "Redis Stream key") + maxlen := flag.Int64("maxlen", 2000, "Approximate MAXLEN cap on every XADD") + claimIdleMs := flag.Int64("claim-idle-ms", 5000, + "Minimum idle time before XAUTOCLAIM may reassign a pending entry") + noReset := flag.Bool("no-reset", false, + "Keep any existing data at --stream-key instead of deleting it on startup. "+ + "By default the demo wipes the stream so each run starts from an empty state.") + flag.Parse() + + client := redis.NewClient(&redis.Options{ + Addr: fmt.Sprintf("%s:%d", *redisHost, *redisPort), + }) + ctx, cancel := context.WithCancel(context.Background()) + defer cancel() + if err := client.Ping(ctx).Err(); err != nil { + log.Fatalf("could not reach Redis at %s:%d: %v", *redisHost, *redisPort, err) + } + + stream := NewEventStream(client, *streamKey, *maxlen, *claimIdleMs) + demo := NewStreamingDemo(stream) + + if !*noReset { + fmt.Printf("Deleting any existing data at key '%s' for a clean demo run (pass --no-reset to keep it).\n", *streamKey) + if err := stream.DeleteStream(ctx); err != nil { + log.Fatalf("could not delete stream key: %v", err) + } + } + seeded, err := demo.Seed(ctx, DefaultGroups) + if err != nil { + log.Fatalf("could not seed default groups: %v", err) + } + + srv := &httpServer{stream: stream, demo: demo} + + mux := http.NewServeMux() + mux.HandleFunc("/", srv.handleRoot) + mux.HandleFunc("/state", srv.handleState) + mux.HandleFunc("/produce", srv.handleProduce) + mux.HandleFunc("/add-worker", srv.handleAddWorker) + mux.HandleFunc("/remove-worker", srv.handleRemoveWorker) + mux.HandleFunc("/crash", srv.handleCrash) + mux.HandleFunc("/autoclaim", srv.handleAutoclaim) + mux.HandleFunc("/trim", srv.handleTrim) + mux.HandleFunc("/replay", srv.handleReplay) + mux.HandleFunc("/reset", srv.handleReset) + + httpSrv := &http.Server{ + Addr: fmt.Sprintf("%s:%d", *host, *port), + Handler: mux, + } + + fmt.Printf("Redis streaming demo server listening on http://%s:%d\n", *host, *port) + fmt.Printf( + "Using Redis at %s:%d with stream key '%s' (MAXLEN ~ %d)\n", + *redisHost, *redisPort, *streamKey, *maxlen, + ) + fmt.Printf("Seeded %d consumer(s) across %d group(s)\n", seeded, len(DefaultGroups)) + + stop := make(chan os.Signal, 1) + signal.Notify(stop, os.Interrupt, syscall.SIGTERM) + + go func() { + if err := httpSrv.ListenAndServe(); err != nil && !errors.Is(err, http.ErrServerClosed) { + log.Fatalf("HTTP server: %v", err) + } + }() + + <-stop + fmt.Println("Shutting down...") + shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 5*time.Second) + defer shutdownCancel() + _ = httpSrv.Shutdown(shutdownCtx) + demo.StopAll() +} + +// ------------------------------------------------------------------ +// helpers +// ------------------------------------------------------------------ + +func parseIntOr(s string, fallback int) int { + if s == "" { + return fallback + } + n, err := strconv.Atoi(s) + if err != nil { + return fallback + } + return n +} + +func clamp(v, lo, hi int) int { + if v < lo { + return lo + } + if v > hi { + return hi + } + return v +} + +var customerNames = []string{"alice", "bob", "carol", "dan", "erin"} + +func fakePayload() map[string]string { + return map[string]string{ + "order_id": fmt.Sprintf("o-%d", 1000+rand.Intn(9000)), + "customer": customerNames[rand.Intn(len(customerNames))], + "amount": fmt.Sprintf("%.2f", 5.0+rand.Float64()*245.0), + } +} + +// htmlTemplate is the inlined HTML page. __STREAM_KEY__, __MAXLEN__, +// and __CLAIM_IDLE__ placeholders are substituted per request so the +// rendered page reflects the configured values. +const htmlTemplate = ` + + + + + Redis Streaming Demo + + + +
+
go-redis + Go net/http
+

Redis Streaming Demo

+

+ Producers append events to a single Redis Stream + (__STREAM_KEY__). Two consumer groups read the same + stream independently: notifications shares its work + across two consumers, analytics processes the full + flow on its own. Acknowledge with XACK, recover + crashed deliveries with XAUTOCLAIM, replay any range + with XRANGE, and bound retention with XTRIM. +

+ +
+
+

Stream state

+
Loading...
+ + +
+ +
+

Produce events

+

Events are appended with XADD with an approximate + MAXLEN ~ __MAXLEN__ retention cap.

+ + + + + +
+ +
+

Replay range (XRANGE)

+

Reads a slice of history. Replay is independent of any + consumer group — no cursors move, no acks happen.

+ + + + + + + +
+ +
+

Trim retention (XTRIM)

+

Cap the stream length. Approximate trimming releases whole + macro-nodes, which is much cheaper than exact trimming.

+ + + +
+ +
+

Consumer groups

+
Loading...
+
+ +
+

Pending entries (XPENDING)

+

Entries delivered to a consumer that haven't been acked yet. + Idle time ≥ __CLAIM_IDLE__ ms is eligible for + XAUTOCLAIM.

+
Loading...
+
+ + +
+
+ +
+

Last result

+

Produce events, replay a range, or trigger an autoclaim to see results.

+
+
+ +
+
+ + + + +` diff --git a/content/develop/use-cases/streaming/go/event_stream.go b/content/develop/use-cases/streaming/go/event_stream.go new file mode 100644 index 0000000000..151e5ab2da --- /dev/null +++ b/content/develop/use-cases/streaming/go/event_stream.go @@ -0,0 +1,679 @@ +// Package streaming provides a Redis event-stream helper backed by a +// single Redis Stream. +// +// Producers append events with XADD. Consumers belong to consumer +// groups and read with XREADGROUP. The group as a whole tracks a +// single last-delivered-id cursor, and each consumer gets its own +// pending-entries list (PEL) of in-flight messages it has been handed. +// Once a consumer has processed an entry it acknowledges it with XACK; +// entries left unacknowledged past an idle threshold can be swept to a +// healthy consumer with XAUTOCLAIM (or to a specific one with XCLAIM). +// +// Each XADD carries an approximate MAXLEN so the stream stays bounded +// as it rolls forward. XRANGE supports replay over the retained +// history for debugging, audit, or rebuilding a downstream projection. +// Approximate trimming can release entries that are still in a group's +// PEL: those entries appear in XAUTOCLAIM's deleted-IDs list, which +// the caller should log and route to a dead-letter store. Redis 7+ +// removes them from the PEL inside the XAUTOCLAIM call itself, so no +// explicit XACK is needed. +// +// The same stream can be read by any number of consumer groups — each +// group has its own cursor and its own pending lists, so analytics, +// notifications, and audit can all process the full event flow at +// their own pace without coordinating with each other. +package streaming + +import ( + "context" + "errors" + "fmt" + "strings" + "sync" + "time" + + "github.com/redis/go-redis/v9" +) + +// Entry is one stream entry: an ID plus its field/value map. +type Entry struct { + ID string `json:"id"` + Fields map[string]string `json:"fields"` +} + +// StreamInfo is the JSON-friendly subset of XINFO STREAM the demo cares about. +type StreamInfo struct { + Length int64 `json:"length"` + LastGeneratedID string `json:"last_generated_id"` + FirstEntryID string `json:"first_entry_id"` + LastEntryID string `json:"last_entry_id"` +} + +// GroupInfo is the JSON-friendly subset of one XINFO GROUPS row. +type GroupInfo struct { + Name string `json:"name"` + Consumers int64 `json:"consumers"` + Pending int64 `json:"pending"` + LastDeliveredID string `json:"last_delivered_id"` + // Lag is -1 when Redis cannot determine it (e.g. the consumer group + // was created after some entries had already been trimmed). + Lag int64 `json:"lag"` +} + +// ConsumerInfo is the JSON-friendly subset of one XINFO CONSUMERS row. +type ConsumerInfo struct { + Name string `json:"name"` + Pending int64 `json:"pending"` + IdleMs int64 `json:"idle_ms"` +} + +// PendingEntry is one row from XPENDING in detail mode. +type PendingEntry struct { + ID string `json:"id"` + Consumer string `json:"consumer"` + IdleMs int64 `json:"idle_ms"` + Deliveries int64 `json:"deliveries"` +} + +// Stats holds the helper's in-process counters. +type Stats struct { + ProducedTotal int64 `json:"produced_total"` + AckedTotal int64 `json:"acked_total"` + ClaimedTotal int64 `json:"claimed_total"` +} + +// EventStream is the producer/consumer helper around one Redis Stream +// plus its consumer groups. +type EventStream struct { + client *redis.Client + streamKey string + maxlenApprox int64 + claimMinIdleMs int64 + + mu sync.Mutex + producedTotal int64 + ackedTotal int64 + claimedTotal int64 +} + +// NewEventStream constructs an EventStream around the given Redis +// client. maxlenApprox is the approximate MAXLEN cap applied on every +// XADD; claimMinIdleMs is the idle threshold XAUTOCLAIM uses. +func NewEventStream(client *redis.Client, streamKey string, maxlenApprox int64, claimMinIdleMs int64) *EventStream { + return &EventStream{ + client: client, + streamKey: streamKey, + maxlenApprox: maxlenApprox, + claimMinIdleMs: claimMinIdleMs, + } +} + +// StreamKey returns the configured stream key. +func (s *EventStream) StreamKey() string { return s.streamKey } + +// MaxlenApprox returns the configured approximate MAXLEN cap. +func (s *EventStream) MaxlenApprox() int64 { return s.maxlenApprox } + +// ClaimMinIdleMs returns the configured XAUTOCLAIM idle threshold. +func (s *EventStream) ClaimMinIdleMs() int64 { return s.claimMinIdleMs } + +// ------------------------------------------------------------------ +// Producer +// ------------------------------------------------------------------ + +// Produce appends a single event. Returns the stream ID Redis assigned. +func (s *EventStream) Produce(ctx context.Context, eventType string, payload map[string]string) (string, error) { + ids, err := s.ProduceBatch(ctx, []ProducerEvent{{Type: eventType, Payload: payload}}) + if err != nil { + return "", err + } + if len(ids) == 0 { + return "", errors.New("XADD returned no id") + } + return ids[0], nil +} + +// ProducerEvent is one event in a ProduceBatch call. +type ProducerEvent struct { + Type string + Payload map[string]string +} + +// ProduceBatch pipelines several XADDs in one round trip. +// +// Each entry carries an approximate MAXLEN cap. The "~" flavour lets +// Redis trim at a macro-node boundary, which is much cheaper than +// exact trimming and is the right call for a retention guardrail +// rather than a hard size limit. +func (s *EventStream) ProduceBatch(ctx context.Context, events []ProducerEvent) ([]string, error) { + if len(events) == 0 { + return nil, nil + } + pipe := s.client.Pipeline() + cmds := make([]*redis.StringCmd, 0, len(events)) + for _, ev := range events { + fields := encodeFields(ev.Type, ev.Payload) + cmd := pipe.XAdd(ctx, &redis.XAddArgs{ + Stream: s.streamKey, + MaxLen: s.maxlenApprox, + Approx: true, + Values: fields, + }) + cmds = append(cmds, cmd) + } + if _, err := pipe.Exec(ctx); err != nil { + return nil, err + } + ids := make([]string, 0, len(cmds)) + for _, cmd := range cmds { + id, err := cmd.Result() + if err != nil { + return nil, err + } + ids = append(ids, id) + } + s.mu.Lock() + s.producedTotal += int64(len(ids)) + s.mu.Unlock() + return ids, nil +} + +func encodeFields(eventType string, payload map[string]string) map[string]interface{} { + fields := make(map[string]interface{}, len(payload)+2) + fields["type"] = eventType + fields["ts_ms"] = fmt.Sprintf("%d", time.Now().UnixMilli()) + for k, v := range payload { + fields[k] = v + } + return fields +} + +// ------------------------------------------------------------------ +// Consumer groups +// ------------------------------------------------------------------ + +// EnsureGroup creates the consumer group if it doesn't exist. +// +// "$" means "deliver only events appended after this point"; pass +// "0-0" to replay the entire stream into a fresh group. BUSYGROUP +// errors (group already exists) are swallowed so this is idempotent. +func (s *EventStream) EnsureGroup(ctx context.Context, group, startID string) error { + if startID == "" { + startID = "$" + } + err := s.client.XGroupCreateMkStream(ctx, s.streamKey, group, startID).Err() + if err == nil { + return nil + } + if strings.Contains(err.Error(), "BUSYGROUP") { + return nil + } + return err +} + +// DeleteGroup drops a consumer group entirely. +func (s *EventStream) DeleteGroup(ctx context.Context, group string) error { + return s.client.XGroupDestroy(ctx, s.streamKey, group).Err() +} + +// Consume reads new entries for this consumer via XREADGROUP. +// +// The ">" ID means "deliver entries this consumer group has not +// delivered to anyone yet" — that is the at-least-once path. +// Replaying an explicit ID instead would re-deliver an entry that is +// already in this consumer's pending list (see ConsumeOwnPel for that +// recovery path). +func (s *EventStream) Consume(ctx context.Context, group, consumer string, count int64, blockMs int64) ([]Entry, error) { + res, err := s.client.XReadGroup(ctx, &redis.XReadGroupArgs{ + Group: group, + Consumer: consumer, + Streams: []string{s.streamKey, ">"}, + Count: count, + Block: time.Duration(blockMs) * time.Millisecond, + }).Result() + if err != nil { + if errors.Is(err, redis.Nil) { + return nil, nil + } + return nil, err + } + return flattenStreams(res), nil +} + +// ConsumeOwnPel re-delivers entries already in this consumer's PEL. +// +// Reading with an explicit ID ("0" here) instead of ">" replays the +// entries already assigned to this consumer name without advancing +// the group's last-delivered-id. This is the canonical recovery path +// after a crash on the same consumer name, and is also how a consumer +// picks up entries that another consumer (or XAUTOCLAIM) handed to it. +func (s *EventStream) ConsumeOwnPel(ctx context.Context, group, consumer string, count int64) ([]Entry, error) { + res, err := s.client.XReadGroup(ctx, &redis.XReadGroupArgs{ + Group: group, + Consumer: consumer, + Streams: []string{s.streamKey, "0"}, + Count: count, + Block: -1, + }).Result() + if err != nil { + if errors.Is(err, redis.Nil) { + return nil, nil + } + return nil, err + } + return flattenStreams(res), nil +} + +// Ack runs XACK on the given IDs and returns the number Redis cleared. +func (s *EventStream) Ack(ctx context.Context, group string, ids []string) (int64, error) { + if len(ids) == 0 { + return 0, nil + } + n, err := s.client.XAck(ctx, s.streamKey, group, ids...).Result() + if err != nil { + return 0, err + } + s.mu.Lock() + s.ackedTotal += n + s.mu.Unlock() + return n, nil +} + +// Autoclaim sweeps idle pending entries to the named consumer. +// +// A single XAUTOCLAIM call scans up to pageCount PEL entries starting +// at startID and returns a continuation cursor. For a full sweep of +// the PEL, loop until the cursor returns to "0-0" (or hit maxPages as +// a safety net so a very large PEL can't monopolise the call). +// +// Returns (claimed, deletedIDs). deletedIDs are PEL entries whose +// stream payload had already been trimmed by the time this sweep ran +// (typically because MAXLEN ~ retention outran a slow consumer). +// XAUTOCLAIM removes those dangling slots from the PEL itself — the +// caller does NOT need to XACK them — but they cannot be retried, so +// log and route them to a dead-letter store for observability. +// +// go-redis's typed XAutoClaim wrapper discards the deleted-IDs slot, +// so we issue XAUTOCLAIM through client.Do and parse the raw reply. +func (s *EventStream) Autoclaim(ctx context.Context, group, consumer string, pageCount int64, startID string, maxPages int) ([]Entry, []string, error) { + if startID == "" { + startID = "0-0" + } + if maxPages <= 0 { + maxPages = 10 + } + if pageCount <= 0 { + pageCount = 100 + } + cursor := startID + claimedAll := make([]Entry, 0) + deletedAll := make([]string, 0) + for i := 0; i < maxPages; i++ { + nextCursor, claimed, deleted, err := s.doAutoclaim(ctx, group, consumer, cursor, pageCount) + if err != nil { + return nil, nil, err + } + claimedAll = append(claimedAll, claimed...) + deletedAll = append(deletedAll, deleted...) + if nextCursor == "0-0" { + break + } + cursor = nextCursor + } + s.mu.Lock() + s.claimedTotal += int64(len(claimedAll)) + s.mu.Unlock() + return claimedAll, deletedAll, nil +} + +// doAutoclaim runs one XAUTOCLAIM call via client.Do and parses the raw +// reply so the third (deleted-IDs) element is preserved on Redis 7+. +func (s *EventStream) doAutoclaim(ctx context.Context, group, consumer, startID string, count int64) (string, []Entry, []string, error) { + args := []interface{}{"XAUTOCLAIM", s.streamKey, group, consumer, s.claimMinIdleMs, startID} + if count > 0 { + args = append(args, "COUNT", count) + } + raw, err := s.client.Do(ctx, args...).Result() + if err != nil { + if errors.Is(err, redis.Nil) { + return "0-0", nil, nil, nil + } + return "", nil, nil, err + } + arr, ok := raw.([]interface{}) + if !ok { + return "", nil, nil, fmt.Errorf("XAUTOCLAIM: unexpected reply type %T", raw) + } + if len(arr) < 2 { + return "", nil, nil, fmt.Errorf("XAUTOCLAIM: short reply (%d elements)", len(arr)) + } + + nextCursor, _ := arr[0].(string) + claimedRaw, _ := arr[1].([]interface{}) + claimed := make([]Entry, 0, len(claimedRaw)) + for _, entryRaw := range claimedRaw { + entryArr, ok := entryRaw.([]interface{}) + if !ok || len(entryArr) < 2 { + continue + } + id, _ := entryArr[0].(string) + fields, _ := entryArr[1].([]interface{}) + claimed = append(claimed, Entry{ID: id, Fields: pairsToMap(fields)}) + } + + deleted := make([]string, 0) + if len(arr) >= 3 { + if delRaw, ok := arr[2].([]interface{}); ok { + for _, d := range delRaw { + if s, ok := d.(string); ok { + deleted = append(deleted, s) + } + } + } + } + return nextCursor, claimed, deleted, nil +} + +func pairsToMap(pairs []interface{}) map[string]string { + out := make(map[string]string, len(pairs)/2) + for i := 0; i+1 < len(pairs); i += 2 { + k, _ := pairs[i].(string) + v, _ := pairs[i+1].(string) + out[k] = v + } + return out +} + +// DeleteConsumer drops a consumer from a group. +// +// XGROUP DELCONSUMER destroys this consumer's PEL entries — any entry +// it still owned is no longer tracked anywhere in the group, and +// XAUTOCLAIM will never find it again. Always HandoverPending (or +// XCLAIM manually) to a healthy consumer first; this method is the +// raw destructive call and is exposed only for explicit cleanup. +func (s *EventStream) DeleteConsumer(ctx context.Context, group, consumer string) (int64, error) { + n, err := s.client.XGroupDelConsumer(ctx, s.streamKey, group, consumer).Result() + if err != nil { + return 0, err + } + return n, nil +} + +// HandoverPending moves every PEL entry owned by fromConsumer to toConsumer. +// +// Enumerates the source consumer's PEL with XPENDING ... CONSUMER and +// reassigns each ID with XCLAIM at zero idle time so the move is +// unconditional. (XAUTOCLAIM does not filter by source consumer, so it +// cannot be used for a per-consumer handover.) +// +// Call this before DeleteConsumer whenever the source still has +// pending entries — otherwise XGROUP DELCONSUMER would silently +// destroy them and they could never be recovered. +func (s *EventStream) HandoverPending(ctx context.Context, group, fromConsumer, toConsumer string, batch int64) (int, error) { + if batch <= 0 { + batch = 100 + } + total := 0 + for { + rows, err := s.client.XPendingExt(ctx, &redis.XPendingExtArgs{ + Stream: s.streamKey, + Group: group, + Start: "-", + End: "+", + Count: batch, + Consumer: fromConsumer, + }).Result() + if err != nil { + return total, err + } + if len(rows) == 0 { + return total, nil + } + ids := make([]string, 0, len(rows)) + for _, row := range rows { + ids = append(ids, row.ID) + } + claimed, err := s.client.XClaim(ctx, &redis.XClaimArgs{ + Stream: s.streamKey, + Group: group, + Consumer: toConsumer, + MinIdle: 0, + Messages: ids, + }).Result() + if err != nil { + return total, err + } + total += len(claimed) + s.mu.Lock() + s.claimedTotal += int64(len(claimed)) + s.mu.Unlock() + if int64(len(rows)) < batch { + return total, nil + } + } +} + +// ------------------------------------------------------------------ +// Replay, length, trim +// ------------------------------------------------------------------ + +// Replay reads a slice of history with XRANGE. +// +// Read-only: ranges do not update any group cursor and do not ack +// anything. Useful for bootstrapping a new projection, for building an +// audit view, or for debugging what actually went through the stream. +func (s *EventStream) Replay(ctx context.Context, startID, endID string, count int64) ([]Entry, error) { + if startID == "" { + startID = "-" + } + if endID == "" { + endID = "+" + } + if count <= 0 { + count = 100 + } + msgs, err := s.client.XRangeN(ctx, s.streamKey, startID, endID, count).Result() + if err != nil { + if errors.Is(err, redis.Nil) { + return nil, nil + } + return nil, err + } + return xMessagesToEntries(msgs), nil +} + +// Tail returns the most recent count entries in reverse-chronological +// order, via XREVRANGE. +func (s *EventStream) Tail(ctx context.Context, count int64) ([]Entry, error) { + if count <= 0 { + count = 10 + } + msgs, err := s.client.XRevRangeN(ctx, s.streamKey, "+", "-", count).Result() + if err != nil { + if errors.Is(err, redis.Nil) { + return nil, nil + } + return nil, err + } + return xMessagesToEntries(msgs), nil +} + +// Length runs XLEN. +func (s *EventStream) Length(ctx context.Context) (int64, error) { + n, err := s.client.XLen(ctx, s.streamKey).Result() + if err != nil { + if errors.Is(err, redis.Nil) { + return 0, nil + } + return 0, err + } + return n, nil +} + +// TrimMaxlen runs an approximate XTRIM MAXLEN ~. +func (s *EventStream) TrimMaxlen(ctx context.Context, maxlen int64) (int64, error) { + // Limit=0 -> no cap; the third arg to XTrimMaxLenApprox is the LIMIT + // modifier, not the count. Passing 0 means "no LIMIT clause". + return s.client.XTrimMaxLenApprox(ctx, s.streamKey, maxlen, 0).Result() +} + +// TrimMinid runs an approximate XTRIM MINID ~. +func (s *EventStream) TrimMinid(ctx context.Context, minid string) (int64, error) { + return s.client.XTrimMinIDApprox(ctx, s.streamKey, minid, 0).Result() +} + +// ------------------------------------------------------------------ +// Inspection +// ------------------------------------------------------------------ + +// InfoStream returns a JSON-friendly subset of XINFO STREAM. +func (s *EventStream) InfoStream(ctx context.Context) (StreamInfo, error) { + raw, err := s.client.XInfoStream(ctx, s.streamKey).Result() + if err != nil { + // Missing stream → return zero-valued info rather than an error + // so the demo's /state endpoint can render an empty view. + return StreamInfo{}, nil + } + first := "" + if raw.FirstEntry.ID != "" { + first = raw.FirstEntry.ID + } + last := "" + if raw.LastEntry.ID != "" { + last = raw.LastEntry.ID + } + return StreamInfo{ + Length: raw.Length, + LastGeneratedID: raw.LastGeneratedID, + FirstEntryID: first, + LastEntryID: last, + }, nil +} + +// InfoGroups returns a JSON-friendly subset of XINFO GROUPS. +func (s *EventStream) InfoGroups(ctx context.Context) ([]GroupInfo, error) { + raw, err := s.client.XInfoGroups(ctx, s.streamKey).Result() + if err != nil { + return nil, nil + } + out := make([]GroupInfo, 0, len(raw)) + for _, g := range raw { + out = append(out, GroupInfo{ + Name: g.Name, + Consumers: g.Consumers, + Pending: g.Pending, + LastDeliveredID: g.LastDeliveredID, + Lag: g.Lag, + }) + } + return out, nil +} + +// InfoConsumers returns a JSON-friendly subset of XINFO CONSUMERS. +func (s *EventStream) InfoConsumers(ctx context.Context, group string) ([]ConsumerInfo, error) { + raw, err := s.client.XInfoConsumers(ctx, s.streamKey, group).Result() + if err != nil { + return nil, nil + } + out := make([]ConsumerInfo, 0, len(raw)) + for _, c := range raw { + out = append(out, ConsumerInfo{ + Name: c.Name, + Pending: c.Pending, + IdleMs: c.Idle.Milliseconds(), + }) + } + return out, nil +} + +// PendingDetail returns a per-entry PEL view (id, consumer, idle, deliveries). +func (s *EventStream) PendingDetail(ctx context.Context, group string, count int64) ([]PendingEntry, error) { + if count <= 0 { + count = 20 + } + rows, err := s.client.XPendingExt(ctx, &redis.XPendingExtArgs{ + Stream: s.streamKey, + Group: group, + Start: "-", + End: "+", + Count: count, + }).Result() + if err != nil { + return nil, nil + } + out := make([]PendingEntry, 0, len(rows)) + for _, r := range rows { + out = append(out, PendingEntry{ + ID: r.ID, + Consumer: r.Consumer, + IdleMs: r.Idle.Milliseconds(), + Deliveries: r.RetryCount, + }) + } + return out, nil +} + +// Stats returns a snapshot of the helper's in-process counters. +func (s *EventStream) Stats() Stats { + s.mu.Lock() + defer s.mu.Unlock() + return Stats{ + ProducedTotal: s.producedTotal, + AckedTotal: s.ackedTotal, + ClaimedTotal: s.claimedTotal, + } +} + +// ResetStats zeroes the helper's in-process counters. +func (s *EventStream) ResetStats() { + s.mu.Lock() + defer s.mu.Unlock() + s.producedTotal = 0 + s.ackedTotal = 0 + s.claimedTotal = 0 +} + +// DeleteStream drops the stream key entirely. Used by the demo's reset +// path; not something a real app should call casually. +func (s *EventStream) DeleteStream(ctx context.Context) error { + return s.client.Del(ctx, s.streamKey).Err() +} + +// ------------------------------------------------------------------ +// helpers +// ------------------------------------------------------------------ + +func flattenStreams(streams []redis.XStream) []Entry { + out := make([]Entry, 0) + for _, st := range streams { + for _, msg := range st.Messages { + out = append(out, Entry{ID: msg.ID, Fields: valuesToStringMap(msg.Values)}) + } + } + return out +} + +func xMessagesToEntries(msgs []redis.XMessage) []Entry { + out := make([]Entry, 0, len(msgs)) + for _, msg := range msgs { + out = append(out, Entry{ID: msg.ID, Fields: valuesToStringMap(msg.Values)}) + } + return out +} + +func valuesToStringMap(values map[string]interface{}) map[string]string { + out := make(map[string]string, len(values)) + for k, v := range values { + switch t := v.(type) { + case string: + out[k] = t + case []byte: + out[k] = string(t) + case nil: + out[k] = "" + default: + out[k] = fmt.Sprintf("%v", t) + } + } + return out +} diff --git a/content/develop/use-cases/streaming/go/go.mod b/content/develop/use-cases/streaming/go/go.mod new file mode 100644 index 0000000000..0090e7ba71 --- /dev/null +++ b/content/develop/use-cases/streaming/go/go.mod @@ -0,0 +1,11 @@ +module streaming + +go 1.21 + +require github.com/redis/go-redis/v9 v9.18.0 + +require ( + github.com/cespare/xxhash/v2 v2.3.0 // indirect + github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f // indirect + go.uber.org/atomic v1.11.0 // indirect +) diff --git a/content/develop/use-cases/streaming/go/go.sum b/content/develop/use-cases/streaming/go/go.sum new file mode 100644 index 0000000000..e25b1f4d0a --- /dev/null +++ b/content/develop/use-cases/streaming/go/go.sum @@ -0,0 +1,22 @@ +github.com/bsm/ginkgo/v2 v2.12.0 h1:Ny8MWAHyOepLGlLKYmXG4IEkioBysk6GpaRTLC8zwWs= +github.com/bsm/ginkgo/v2 v2.12.0/go.mod h1:SwYbGRRDovPVboqFv0tPTcG1sN61LM1Z4ARdbAV9g4c= +github.com/bsm/gomega v1.27.10 h1:yeMWxP2pV2fG3FgAODIY8EiRE3dy0aeFYt4l7wh6yKA= +github.com/bsm/gomega v1.27.10/go.mod h1:JyEr/xRbxbtgWNi8tIEVPUYZ5Dzef52k01W3YH0H+O0= +github.com/cespare/xxhash/v2 v2.3.0 h1:UL815xU9SqsFlibzuggzjXhog7bL6oX9BbNZnL2UFvs= +github.com/cespare/xxhash/v2 v2.3.0/go.mod h1:VGX0DQ3Q6kWi7AoAeZDth3/j3BFtOZR5XLFGgcrjCOs= +github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c= +github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= +github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f h1:lO4WD4F/rVNCu3HqELle0jiPLLBs70cWOduZpkS1E78= +github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f/go.mod h1:cuUVRXasLTGF7a8hSLbxyZXjz+1KgoB3wDUb6vlszIc= +github.com/klauspost/cpuid/v2 v2.0.9 h1:lgaqFMSdTdQYdZ04uHyN2d/eKdOMyi2YLSvlQIBFYa4= +github.com/klauspost/cpuid/v2 v2.0.9/go.mod h1:FInQzS24/EEf25PyTYn52gqo7WaD8xa0213Md/qVLRg= +github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM= +github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= +github.com/redis/go-redis/v9 v9.18.0 h1:pMkxYPkEbMPwRdenAzUNyFNrDgHx9U+DrBabWNfSRQs= +github.com/redis/go-redis/v9 v9.18.0/go.mod h1:k3ufPphLU5YXwNTUcCRXGxUoF1fqxnhFQmscfkCoDA0= +github.com/stretchr/testify v1.3.0 h1:TivCn/peBQ7UY8ooIcPgZFpTNSz0Q2U6UrFlUfqbe0Q= +github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI= +github.com/zeebo/xxh3 v1.0.2 h1:xZmwmqxHZA8AI603jOQ0tMqmBr9lPeFwGg6d+xy9DC0= +github.com/zeebo/xxh3 v1.0.2/go.mod h1:5NWz9Sef7zIDm2JHfFlcQvNekmcEl9ekUZQQKCYaDcA= +go.uber.org/atomic v1.11.0 h1:ZvwS0R+56ePWxUNi+Atn9dWONBPp/AUETXlHW0DxSjE= +go.uber.org/atomic v1.11.0/go.mod h1:LUxbIzbOniOlMKjJjyPfpl4v+PKK2cNJn91OQbhoJI0= diff --git a/content/develop/use-cases/streaming/java-jedis/ConsumerWorker.java b/content/develop/use-cases/streaming/java-jedis/ConsumerWorker.java new file mode 100644 index 0000000000..44e76bab54 --- /dev/null +++ b/content/develop/use-cases/streaming/java-jedis/ConsumerWorker.java @@ -0,0 +1,308 @@ +import java.util.ArrayDeque; +import java.util.ArrayList; +import java.util.Collections; +import java.util.Deque; +import java.util.LinkedHashMap; +import java.util.List; +import java.util.Map; + +/** + * Background consumer thread for a single consumer in a consumer group. + * + *

Each worker owns a daemon thread that loops on {@code XREADGROUP >} + * with a short block timeout and acks every entry it processes. Recovery + * of stuck PEL entries (this consumer's, or anyone else's) happens + * through {@link #reapIdlePel()}, which is the textbook Streams pattern: + * each consumer periodically (or on demand) calls {@code XAUTOCLAIM} + * with itself as the target, then processes whatever it claimed. The + * demo's "XAUTOCLAIM to selected" button is exactly that call.

+ * + *

Two demo-only levers are wired into the loop:

+ *
    + *
  • {@link #pause()} parks the worker (so its pending entries age + * into the {@code XAUTOCLAIM} window without being consumed by + * {@code >} reads).
  • + *
  • {@link #crashNext(int)} tells the worker to drop its next + * {@code n} deliveries on the floor without acking them — the same + * effect as a worker process dying mid-message.
  • + *
+ * + *

Real consumers do not need either lever; they only need + * {@code XREADGROUP} → process → {@code XACK} in {@code run} and a + * periodic {@link #reapIdlePel()} call to recover stuck entries.

+ */ +public class ConsumerWorker { + + private final EventStream stream; + private final String group; + private final String name; + private final long processLatencyMs; + private final int recentCapacity; + + private final Object lock = new Object(); + private final Deque> recent; + private long processed; + private long reaped; + private long crashedDrops; + private int crashNext; + private volatile boolean paused; + private volatile boolean stopRequested; + private Thread thread; + + public ConsumerWorker(EventStream stream, String group, String name) { + this(stream, group, name, 25L, 20); + } + + public ConsumerWorker(EventStream stream, String group, String name, + long processLatencyMs, int recentCapacity) { + if (stream == null || group == null || name == null) { + throw new IllegalArgumentException("stream, group, and name are required"); + } + this.stream = stream; + this.group = group; + this.name = name; + this.processLatencyMs = Math.max(0L, processLatencyMs); + this.recentCapacity = recentCapacity > 0 ? recentCapacity : 20; + this.recent = new ArrayDeque<>(this.recentCapacity); + } + + public String getName() { + return name; + } + + public String getGroup() { + return group; + } + + // ------------------------------------------------------------------ + // Lifecycle + // ------------------------------------------------------------------ + + public synchronized void start() { + if (thread != null && thread.isAlive()) { + return; + } + stopRequested = false; + thread = new Thread(this::run, "consumer-" + group + "-" + name); + thread.setDaemon(true); + thread.start(); + } + + public synchronized void stop() { + stop(1000L); + } + + public synchronized void stop(long joinTimeoutMs) { + stopRequested = true; + if (thread == null) { + return; + } + try { + thread.join(joinTimeoutMs); + } catch (InterruptedException e) { + Thread.currentThread().interrupt(); + } + if (!thread.isAlive()) { + thread = null; + } + } + + // ------------------------------------------------------------------ + // Demo levers + // ------------------------------------------------------------------ + + public void pause() { + paused = true; + } + + public void resume() { + paused = false; + } + + /** + * Drop the next {@code count} deliveries without acking them. + * + *

The entries stay in the group's PEL with their delivery counter + * incremented, so {@code XAUTOCLAIM} can recover them once they + * exceed the idle threshold.

+ */ + public void crashNext(int count) { + if (count <= 0) { + return; + } + synchronized (lock) { + crashNext += count; + } + } + + // ------------------------------------------------------------------ + // Introspection + // ------------------------------------------------------------------ + + public List> recent() { + synchronized (lock) { + return new ArrayList<>(recent); + } + } + + public Map status() { + Map status = new LinkedHashMap<>(); + synchronized (lock) { + status.put("name", name); + status.put("group", group); + status.put("processed", processed); + status.put("reaped", reaped); + status.put("crashed_drops", crashedDrops); + status.put("paused", paused); + status.put("crash_queued", (long) crashNext); + status.put("alive", thread != null && thread.isAlive()); + } + return status; + } + + // ------------------------------------------------------------------ + // Recovery + // ------------------------------------------------------------------ + + /** Result returned by {@link #reapIdlePel()}. */ + public static final class ReapResult { + public final int claimed; + public final int processed; + public final List deletedIds; + + public ReapResult(int claimed, int processed, List deletedIds) { + this.claimed = claimed; + this.processed = processed; + this.deletedIds = deletedIds; + } + } + + /** + * Run {@code XAUTOCLAIM} into self and process the claimed entries. + * + *

Safe to call from any thread — the heavy lifting is + * {@code stream.autoclaim} (a Redis call) and the sequential + * per-entry dispatch via {@link #handleEntry}. The + * {@code deletedIds} list contains PEL entries whose stream payload + * was already trimmed by {@code MAXLEN ~} / {@code XTRIM} before the + * sweep ran. Redis 7+ removes them from the PEL inside + * {@code XAUTOCLAIM} itself, so the caller does not have to + * {@code XACK} them; they are reported so the caller can route them + * to a dead-letter store.

+ */ + public ReapResult reapIdlePel() { + EventStream.AutoClaimResult result = stream.autoclaim(group, name, 100, "0-0", 10); + int processedCount = 0; + for (EventStream.Entry entry : result.claimed) { + try { + handleEntry(entry.id, entry.fields); + processedCount++; + } catch (Exception exc) { + System.err.printf( + "[%s/%s] reap failed on %s: %s%n", + group, name, entry.id, exc); + } + } + synchronized (lock) { + reaped += processedCount; + } + return new ReapResult( + result.claimed.size(), processedCount, + result.deletedIds == null ? Collections.emptyList() : result.deletedIds); + } + + // ------------------------------------------------------------------ + // Main loop + // ------------------------------------------------------------------ + + private void run() { + while (!stopRequested) { + if (paused) { + sleep(50L); + continue; + } + List entries; + try { + entries = stream.consume(group, name, 10, 500L); + } catch (Exception exc) { + // Don't kill the thread on a transient Redis error; a + // real consumer would log this and back off. + System.err.printf("[%s/%s] read failed: %s%n", group, name, exc); + sleep(500L); + continue; + } + if (entries == null) { + continue; + } + for (EventStream.Entry entry : entries) { + dispatch(entry.id, entry.fields); + } + } + } + + private void dispatch(String entryId, Map fields) { + if (processLatencyMs > 0L) { + sleep(processLatencyMs); + } + try { + handleEntry(entryId, fields); + } catch (Exception exc) { + // A failure here (typically XACK against Redis) must not + // kill the daemon thread — that would silently halt this + // consumer while every other entry sat in its PEL waiting + // for XAUTOCLAIM. The entry stays unacked; the next + // reapIdlePel call (here or on any consumer in the group) + // can recover it once it exceeds the idle threshold. + System.err.printf("[%s/%s] failed to handle %s: %s%n", group, name, entryId, exc); + synchronized (lock) { + recordRecent(entryId, fields, false, "handler error: " + exc); + } + } + } + + private void handleEntry(String entryId, Map fields) { + boolean drop; + synchronized (lock) { + drop = crashNext > 0; + if (drop) { + crashNext--; + } + } + + if (drop) { + synchronized (lock) { + crashedDrops++; + recordRecent(entryId, fields, false, "dropped (simulated crash)"); + } + return; + } + + stream.ack(group, Collections.singletonList(entryId)); + synchronized (lock) { + processed++; + recordRecent(entryId, fields, true, ""); + } + } + + private void recordRecent( + String entryId, Map fields, boolean acked, String note) { + Map rec = new LinkedHashMap<>(); + rec.put("id", entryId); + rec.put("type", fields.getOrDefault("type", "")); + rec.put("fields", fields); + rec.put("acked", acked); + rec.put("note", note); + if (recent.size() >= recentCapacity) { + recent.pollLast(); + } + recent.addFirst(rec); + } + + private static void sleep(long ms) { + try { + Thread.sleep(ms); + } catch (InterruptedException e) { + Thread.currentThread().interrupt(); + } + } +} diff --git a/content/develop/use-cases/streaming/java-jedis/DemoServer.java b/content/develop/use-cases/streaming/java-jedis/DemoServer.java new file mode 100644 index 0000000000..dd0f2e3992 --- /dev/null +++ b/content/develop/use-cases/streaming/java-jedis/DemoServer.java @@ -0,0 +1,1215 @@ +import com.sun.net.httpserver.HttpExchange; +import com.sun.net.httpserver.HttpHandler; +import com.sun.net.httpserver.HttpServer; + +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.net.InetSocketAddress; +import java.net.URLDecoder; +import java.nio.charset.StandardCharsets; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.HashMap; +import java.util.LinkedHashMap; +import java.util.LinkedHashSet; +import java.util.List; +import java.util.Map; +import java.util.Random; +import java.util.Set; +import java.util.concurrent.Executors; +import java.util.concurrent.locks.ReentrantLock; + +import redis.clients.jedis.JedisPool; +import redis.clients.jedis.JedisPoolConfig; + +/** + * Redis streaming demo server (Jedis + JDK HttpServer). + * + *

Run this file and visit {@code http://localhost:8083} to watch a + * Redis Stream in action: producers append events to a single stream, + * two independent consumer groups read the same stream at their own + * pace, and within the {@code notifications} group two consumers share + * the work.

+ */ +public class DemoServer { + + private static final String[] EVENT_TYPES = { + "order.placed", "order.paid", "order.shipped", "order.cancelled" + }; + private static final Map> DEFAULT_GROUPS = new LinkedHashMap<>(); + static { + DEFAULT_GROUPS.put("notifications", Arrays.asList("worker-a", "worker-b")); + DEFAULT_GROUPS.put("analytics", Arrays.asList("worker-c")); + } + + private static JedisPool jedisPool; + private static EventStream stream; + private static StreamingDemo demo; + private static final Random random = new Random(); + + public static void main(String[] args) { + String host = "127.0.0.1"; + int port = 8083; + String redisHost = "localhost"; + int redisPort = 6379; + String streamKey = EventStream.DEFAULT_STREAM_KEY; + int maxlen = 2000; + int claimIdleMs = 5000; + boolean resetOnStart = true; + + for (int i = 0; i < args.length; i++) { + switch (args[i]) { + case "--host": + host = args[++i]; + break; + case "--port": + port = Integer.parseInt(args[++i]); + break; + case "--redis-host": + redisHost = args[++i]; + break; + case "--redis-port": + redisPort = Integer.parseInt(args[++i]); + break; + case "--stream-key": + streamKey = args[++i]; + break; + case "--maxlen": + maxlen = Integer.parseInt(args[++i]); + break; + case "--claim-idle-ms": + claimIdleMs = Integer.parseInt(args[++i]); + break; + case "--no-reset": + resetOnStart = false; + break; + default: + break; + } + } + + try { + JedisPoolConfig config = new JedisPoolConfig(); + // Sized for: 3 workers (blocking XREADGROUP) + 16 HTTP handlers + // + small headroom. Each blocking call holds its Jedis for the + // block duration, so undersizing the pool would starve handlers. + config.setMaxTotal(64); + config.setMaxIdle(32); + jedisPool = new JedisPool(config, redisHost, redisPort); + jedisPool.getResource().close(); + } catch (Exception e) { + System.err.printf("Failed to connect to Redis at %s:%d: %s%n", redisHost, redisPort, e.getMessage()); + System.exit(1); + } + + stream = new EventStream(jedisPool, streamKey, maxlen, claimIdleMs); + demo = new StreamingDemo(stream); + + if (resetOnStart) { + System.out.printf( + "Deleting any existing data at key '%s' for a clean demo run (pass --no-reset to keep it).%n", + streamKey); + stream.deleteStream(); + } + int seeded = demo.seed(DEFAULT_GROUPS); + + try { + HttpServer server = HttpServer.create(new InetSocketAddress(host, port), 0); + server.createContext("/", new RootHandler()); + server.createContext("/state", new StateHandler()); + server.createContext("/produce", new ProduceHandler()); + server.createContext("/add-worker", new AddWorkerHandler()); + server.createContext("/remove-worker", new RemoveWorkerHandler()); + server.createContext("/crash", new CrashHandler()); + server.createContext("/autoclaim", new AutoclaimHandler()); + server.createContext("/trim", new TrimHandler()); + server.createContext("/replay", new ReplayHandler()); + server.createContext("/reset", new ResetHandler()); + server.setExecutor(Executors.newFixedThreadPool(16)); + server.start(); + + System.out.printf("Redis streaming demo server listening on http://%s:%d%n", host, port); + System.out.printf( + "Using Redis at %s:%d with stream key '%s' (MAXLEN ~ %d)%n", + redisHost, redisPort, streamKey, maxlen); + System.out.printf( + "Seeded %d consumer(s) across %d group(s)%n", + seeded, DEFAULT_GROUPS.size()); + + Runtime.getRuntime().addShutdownHook(new Thread(() -> { + demo.stopAll(); + server.stop(0); + jedisPool.close(); + })); + } catch (IOException e) { + System.err.println("Failed to start server: " + e.getMessage()); + demo.stopAll(); + jedisPool.close(); + System.exit(1); + } + } + + // ------------------------------------------------------------------ + // Demo registry + // ------------------------------------------------------------------ + + /** + * In-memory registry of consumer workers across all groups. + * + *

The JDK HttpServer dispatches each HTTP request on a worker + * thread, so any code that mutates {@code workers} (or iterates it + * while another handler is mutating it) needs the lock.

+ */ + static class StreamingDemo { + private final EventStream stream; + private final Map workers = new LinkedHashMap<>(); + private final ReentrantLock lock = new ReentrantLock(); + + StreamingDemo(EventStream stream) { + this.stream = stream; + } + + int seed(Map> groups) { + lock.lock(); + try { + for (Map.Entry> entry : groups.entrySet()) { + stream.ensureGroup(entry.getKey(), "0-0"); + for (String name : entry.getValue()) { + addWorkerInternal(entry.getKey(), name); + } + } + int total = 0; + for (List v : groups.values()) { + total += v.size(); + } + return total; + } finally { + lock.unlock(); + } + } + + boolean addWorker(String group, String name) { + lock.lock(); + try { + return addWorkerInternal(group, name); + } finally { + lock.unlock(); + } + } + + private boolean addWorkerInternal(String group, String name) { + String key = workerKey(group, name); + if (workers.containsKey(key)) { + return false; + } + stream.ensureGroup(group, "0-0"); + ConsumerWorker worker = new ConsumerWorker(stream, group, name); + worker.start(); + workers.put(key, worker); + return true; + } + + Map removeWorker(String group, String name) { + lock.lock(); + try { + String key = workerKey(group, name); + ConsumerWorker worker = workers.get(key); + Map result = new LinkedHashMap<>(); + if (worker == null) { + result.put("removed", false); + result.put("reason", "not-found"); + return result; + } + + String handoverTarget = null; + for (Map.Entry e : workers.entrySet()) { + ConsumerWorker w = e.getValue(); + if (w.getGroup().equals(group) && !w.getName().equals(name)) { + handoverTarget = w.getName(); + break; + } + } + if (handoverTarget == null) { + result.put("removed", false); + result.put("reason", "no-peer"); + result.put("message", + group + "/" + name + " still owns pending entries and is the only " + + "consumer in its group; add another consumer first so its " + + "PEL can be handed over before deletion."); + return result; + } + + int handed = stream.handoverPending(group, name, handoverTarget, 100); + + workers.remove(key); + worker.stop(); + stream.deleteConsumer(group, name); + + result.put("removed", true); + result.put("handed_over_to", handoverTarget); + result.put("handed_over_count", (long) handed); + return result; + } finally { + lock.unlock(); + } + } + + ConsumerWorker getWorker(String group, String name) { + lock.lock(); + try { + return workers.get(workerKey(group, name)); + } finally { + lock.unlock(); + } + } + + List workersSnapshot() { + lock.lock(); + try { + return new ArrayList<>(workers.values()); + } finally { + lock.unlock(); + } + } + + void stopAll() { + lock.lock(); + try { + for (ConsumerWorker w : workers.values()) { + w.stop(); + } + workers.clear(); + } finally { + lock.unlock(); + } + } + + int reset() { + lock.lock(); + try { + stopAll(); + stream.deleteStream(); + stream.resetStats(); + return seed(DEFAULT_GROUPS); + } finally { + lock.unlock(); + } + } + + private static String workerKey(String group, String name) { + return group + "" + name; + } + } + + // ------------------------------------------------------------------ + // Handlers + // ------------------------------------------------------------------ + + static class RootHandler implements HttpHandler { + @Override + public void handle(HttpExchange exchange) throws IOException { + String path = exchange.getRequestURI().getPath(); + if (!"GET".equalsIgnoreCase(exchange.getRequestMethod())) { + sendJson(exchange, 405, "{\"error\":\"Method Not Allowed\"}"); + return; + } + if (path.equals("/") || path.equals("/index.html")) { + byte[] body = renderHtmlPage().getBytes(StandardCharsets.UTF_8); + exchange.getResponseHeaders().set("Content-Type", "text/html; charset=utf-8"); + exchange.sendResponseHeaders(200, body.length); + try (OutputStream os = exchange.getResponseBody()) { + os.write(body); + } + return; + } + sendJson(exchange, 404, "{\"error\":\"Not Found\"}"); + } + } + + static class StateHandler implements HttpHandler { + @Override + public void handle(HttpExchange exchange) throws IOException { + if (!"GET".equalsIgnoreCase(exchange.getRequestMethod())) { + sendJson(exchange, 405, "{\"error\":\"Method Not Allowed\"}"); + return; + } + sendJson(exchange, 200, toJson(buildState())); + } + } + + static class ProduceHandler implements HttpHandler { + @Override + public void handle(HttpExchange exchange) throws IOException { + if (!"POST".equalsIgnoreCase(exchange.getRequestMethod())) { + sendJson(exchange, 405, "{\"error\":\"Method Not Allowed\"}"); + return; + } + Map form = parseFormData(readRequestBody(exchange)); + int count = clamp(parseIntOr(form.get("count"), 1), 1, 500); + String type = form.getOrDefault("type", "").trim(); + List>> events = new ArrayList<>(count); + for (int i = 0; i < count; i++) { + String picked = type.isEmpty() ? EVENT_TYPES[random.nextInt(EVENT_TYPES.length)] : type; + events.add(new java.util.AbstractMap.SimpleEntry<>(picked, fakePayload())); + } + List ids = stream.produceBatch(events); + Map response = new LinkedHashMap<>(); + response.put("produced", (long) ids.size()); + response.put("ids", ids); + sendJson(exchange, 200, toJson(response)); + } + } + + static class AddWorkerHandler implements HttpHandler { + @Override + public void handle(HttpExchange exchange) throws IOException { + if (!"POST".equalsIgnoreCase(exchange.getRequestMethod())) { + sendJson(exchange, 405, "{\"error\":\"Method Not Allowed\"}"); + return; + } + Map form = parseFormData(readRequestBody(exchange)); + String group = form.getOrDefault("group", "").trim(); + String name = form.getOrDefault("name", "").trim(); + if (group.isEmpty() || name.isEmpty()) { + sendJson(exchange, 400, "{\"error\":\"group and name are required\"}"); + return; + } + boolean added = demo.addWorker(group, name); + if (!added) { + sendJson(exchange, 409, + "{\"error\":\"" + jsonEscape(group + "/" + name + " already exists") + "\"}"); + return; + } + Map response = new LinkedHashMap<>(); + response.put("group", group); + response.put("name", name); + sendJson(exchange, 200, toJson(response)); + } + } + + static class RemoveWorkerHandler implements HttpHandler { + @Override + public void handle(HttpExchange exchange) throws IOException { + if (!"POST".equalsIgnoreCase(exchange.getRequestMethod())) { + sendJson(exchange, 405, "{\"error\":\"Method Not Allowed\"}"); + return; + } + Map form = parseFormData(readRequestBody(exchange)); + String group = form.getOrDefault("group", "").trim(); + String name = form.getOrDefault("name", "").trim(); + Map result = demo.removeWorker(group, name); + int status = 200; + Object removed = result.get("removed"); + Object reason = result.get("reason"); + if (!Boolean.TRUE.equals(removed) && !"not-found".equals(reason)) { + status = 409; + } + sendJson(exchange, status, toJson(result)); + } + } + + static class CrashHandler implements HttpHandler { + @Override + public void handle(HttpExchange exchange) throws IOException { + if (!"POST".equalsIgnoreCase(exchange.getRequestMethod())) { + sendJson(exchange, 405, "{\"error\":\"Method Not Allowed\"}"); + return; + } + Map form = parseFormData(readRequestBody(exchange)); + String group = form.getOrDefault("group", "").trim(); + String name = form.getOrDefault("name", "").trim(); + int count = parseIntOr(form.get("count"), 1); + ConsumerWorker worker = demo.getWorker(group, name); + if (worker == null) { + sendJson(exchange, 404, + "{\"error\":\"" + jsonEscape("unknown consumer " + group + "/" + name) + "\"}"); + return; + } + worker.crashNext(count); + sendJson(exchange, 200, "{\"queued\":" + count + "}"); + } + } + + static class AutoclaimHandler implements HttpHandler { + @Override + public void handle(HttpExchange exchange) throws IOException { + if (!"POST".equalsIgnoreCase(exchange.getRequestMethod())) { + sendJson(exchange, 405, "{\"error\":\"Method Not Allowed\"}"); + return; + } + Map form = parseFormData(readRequestBody(exchange)); + String group = form.getOrDefault("group", "").trim(); + String consumer = form.getOrDefault("consumer", "").trim(); + if (group.isEmpty() || consumer.isEmpty()) { + sendJson(exchange, 400, "{\"error\":\"group and consumer are required\"}"); + return; + } + ConsumerWorker worker = demo.getWorker(group, consumer); + if (worker == null) { + sendJson(exchange, 404, + "{\"error\":\"" + jsonEscape("unknown consumer " + group + "/" + consumer) + "\"}"); + return; + } + ConsumerWorker.ReapResult result = worker.reapIdlePel(); + Map response = new LinkedHashMap<>(); + response.put("claimed", (long) result.claimed); + response.put("processed", (long) result.processed); + response.put("deleted", result.deletedIds); + response.put("min_idle_ms", (long) stream.getClaimMinIdleMs()); + sendJson(exchange, 200, toJson(response)); + } + } + + static class TrimHandler implements HttpHandler { + @Override + public void handle(HttpExchange exchange) throws IOException { + if (!"POST".equalsIgnoreCase(exchange.getRequestMethod())) { + sendJson(exchange, 405, "{\"error\":\"Method Not Allowed\"}"); + return; + } + Map form = parseFormData(readRequestBody(exchange)); + long maxlen = parseIntOr(form.get("maxlen"), 0); + long deleted = stream.trimMaxlen(maxlen); + Map response = new LinkedHashMap<>(); + response.put("deleted", deleted); + response.put("maxlen", maxlen); + sendJson(exchange, 200, toJson(response)); + } + } + + static class ReplayHandler implements HttpHandler { + @Override + public void handle(HttpExchange exchange) throws IOException { + if (!"GET".equalsIgnoreCase(exchange.getRequestMethod())) { + sendJson(exchange, 405, "{\"error\":\"Method Not Allowed\"}"); + return; + } + Map query = parseQuery(exchange.getRequestURI().getRawQuery()); + String start = query.getOrDefault("start", "-"); + if (start.isEmpty()) start = "-"; + String end = query.getOrDefault("end", "+"); + if (end.isEmpty()) end = "+"; + int limit = clamp(parseIntOr(query.get("count"), 20), 1, 500); + List entries = stream.replay(start, end, limit); + List> entriesJson = new ArrayList<>(entries.size()); + for (EventStream.Entry entry : entries) { + Map e = new LinkedHashMap<>(); + e.put("id", entry.id); + e.put("fields", entry.fields); + entriesJson.add(e); + } + Map response = new LinkedHashMap<>(); + response.put("start", start); + response.put("end", end); + response.put("limit", (long) limit); + response.put("entries", entriesJson); + sendJson(exchange, 200, toJson(response)); + } + } + + static class ResetHandler implements HttpHandler { + @Override + public void handle(HttpExchange exchange) throws IOException { + if (!"POST".equalsIgnoreCase(exchange.getRequestMethod())) { + sendJson(exchange, 405, "{\"error\":\"Method Not Allowed\"}"); + return; + } + int count = demo.reset(); + sendJson(exchange, 200, "{\"consumers\":" + count + "}"); + } + } + + // ------------------------------------------------------------------ + // State assembly + // ------------------------------------------------------------------ + + private static Map buildState() { + Map streamInfo = stream.infoStream(); + List> groups = stream.infoGroups(); + + List> groupsDetail = new ArrayList<>(groups.size()); + List> pendingRows = new ArrayList<>(); + List workersSnapshot = demo.workersSnapshot(); + + for (Map group : groups) { + String groupName = (String) group.get("name"); + Map> consumerInfo = new LinkedHashMap<>(); + for (Map c : stream.infoConsumers(groupName)) { + consumerInfo.put((String) c.get("name"), c); + } + List> consumersDetail = new ArrayList<>(); + Set seenNames = new LinkedHashSet<>(); + for (ConsumerWorker worker : workersSnapshot) { + if (!worker.getGroup().equals(groupName)) { + continue; + } + Map status = worker.status(); + Map info = consumerInfo.get(worker.getName()); + status.put("pending", info == null ? 0L : info.get("pending")); + status.put("idle_ms", info == null ? 0L : info.get("idle_ms")); + status.put("recent", worker.recent()); + consumersDetail.add(status); + seenNames.add(worker.getName()); + } + // Include consumers that exist in Redis but not in our in-process registry. + for (Map.Entry> e : consumerInfo.entrySet()) { + if (seenNames.contains(e.getKey())) { + continue; + } + Map orphan = new LinkedHashMap<>(); + orphan.put("name", e.getKey()); + orphan.put("group", groupName); + orphan.put("processed", 0L); + orphan.put("reaped", 0L); + orphan.put("crashed_drops", 0L); + orphan.put("paused", false); + orphan.put("crash_queued", 0L); + orphan.put("alive", false); + orphan.put("pending", e.getValue().get("pending")); + orphan.put("idle_ms", e.getValue().get("idle_ms")); + orphan.put("recent", new ArrayList<>()); + consumersDetail.add(orphan); + } + consumersDetail.sort((a, b) -> + String.valueOf(a.get("name")).compareTo(String.valueOf(b.get("name")))); + Map g = new LinkedHashMap<>(group); + g.put("consumers_detail", consumersDetail); + groupsDetail.add(g); + + for (Map p : stream.pendingDetail(groupName, 50)) { + Map row = new LinkedHashMap<>(p); + row.put("group", groupName); + pendingRows.add(row); + } + } + + // XREVRANGE returns the newest N entries (in reverse order); the + // tail view wants the most recent activity, not the head of history. + List tailEntries = stream.tail(10); + List> tail = new ArrayList<>(tailEntries.size()); + for (EventStream.Entry entry : tailEntries) { + Map e = new LinkedHashMap<>(); + e.put("id", entry.id); + e.put("fields", entry.fields); + tail.add(e); + } + + Map state = new LinkedHashMap<>(); + state.put("stream", streamInfo); + state.put("tail", tail); + state.put("groups", groupsDetail); + state.put("pending", pendingRows); + state.put("stats", stream.stats()); + return state; + } + + // ------------------------------------------------------------------ + // Helpers + // ------------------------------------------------------------------ + + private static Map fakePayload() { + Map payload = new LinkedHashMap<>(); + payload.put("order_id", String.format("o-%d", 1000 + random.nextInt(9000))); + String[] customers = {"alice", "bob", "carol", "dan", "erin"}; + payload.put("customer", customers[random.nextInt(customers.length)]); + double amount = 5.0 + random.nextDouble() * (250.0 - 5.0); + payload.put("amount", String.format("%.2f", amount)); + return payload; + } + + private static int parseIntOr(String value, int defaultValue) { + if (value == null || value.isEmpty()) { + return defaultValue; + } + try { + return Integer.parseInt(value); + } catch (NumberFormatException e) { + return defaultValue; + } + } + + private static int clamp(int value, int min, int max) { + return Math.max(min, Math.min(max, value)); + } + + private static String readRequestBody(HttpExchange exchange) throws IOException { + try (InputStream inputStream = exchange.getRequestBody()) { + return new String(inputStream.readAllBytes(), StandardCharsets.UTF_8); + } + } + + private static Map parseFormData(String body) { + Map params = new HashMap<>(); + if (body == null || body.isEmpty()) { + return params; + } + for (String pair : body.split("&")) { + String[] kv = pair.split("=", 2); + if (kv.length == 0 || kv[0].isEmpty()) { + continue; + } + String key = URLDecoder.decode(kv[0], StandardCharsets.UTF_8); + String value = kv.length == 2 ? URLDecoder.decode(kv[1], StandardCharsets.UTF_8) : ""; + params.put(key, value); + } + return params; + } + + private static Map parseQuery(String query) { + if (query == null || query.isEmpty()) { + return new HashMap<>(); + } + return parseFormData(query); + } + + private static void sendJson(HttpExchange exchange, int status, String body) throws IOException { + byte[] bytes = body.getBytes(StandardCharsets.UTF_8); + exchange.getResponseHeaders().set("Content-Type", "application/json"); + exchange.sendResponseHeaders(status, bytes.length); + try (OutputStream os = exchange.getResponseBody()) { + os.write(bytes); + } + } + + private static String toJson(Object value) { + StringBuilder sb = new StringBuilder(); + appendJson(sb, value); + return sb.toString(); + } + + @SuppressWarnings("unchecked") + private static void appendJson(StringBuilder sb, Object value) { + if (value == null) { + sb.append("null"); + } else if (value instanceof Boolean) { + sb.append(value); + } else if (value instanceof Number) { + sb.append(value); + } else if (value instanceof Map) { + sb.append('{'); + boolean first = true; + for (Map.Entry entry : ((Map) value).entrySet()) { + if (!first) sb.append(','); + first = false; + appendJsonString(sb, String.valueOf(entry.getKey())); + sb.append(':'); + appendJson(sb, entry.getValue()); + } + sb.append('}'); + } else if (value instanceof List) { + sb.append('['); + boolean first = true; + for (Object item : (List) value) { + if (!first) sb.append(','); + first = false; + appendJson(sb, item); + } + sb.append(']'); + } else { + appendJsonString(sb, String.valueOf(value)); + } + } + + private static void appendJsonString(StringBuilder sb, String value) { + sb.append('"').append(jsonEscape(value)).append('"'); + } + + private static String jsonEscape(String value) { + StringBuilder sb = new StringBuilder(value.length() + 4); + for (int i = 0; i < value.length(); i++) { + char c = value.charAt(i); + switch (c) { + case '"': sb.append("\\\""); break; + case '\\': sb.append("\\\\"); break; + case '\n': sb.append("\\n"); break; + case '\r': sb.append("\\r"); break; + case '\t': sb.append("\\t"); break; + default: + if (c < 0x20) { + sb.append(String.format("\\u%04x", (int) c)); + } else { + sb.append(c); + } + } + } + return sb.toString(); + } + + private static String renderHtmlPage() { + return HTML_TEMPLATE + .replace("__STREAM_KEY__", stream.getStreamKey()) + .replace("__MAXLEN__", Integer.toString(stream.getMaxlenApprox())) + .replace("__CLAIM_IDLE__", Integer.toString(stream.getClaimMinIdleMs())); + } + + private static final String HTML_TEMPLATE = """ + + + + + + Redis Streaming Demo + + + +
+
Jedis + com.sun.net.httpserver
+

Redis Streaming Demo

+

+ Producers append events to a single Redis Stream + (__STREAM_KEY__). Two consumer groups read the same + stream independently: notifications shares its work + across two consumers, analytics processes the full + flow on its own. Acknowledge with XACK, recover + crashed deliveries with XAUTOCLAIM, replay any range + with XRANGE, and bound retention with XTRIM. +

+ +
+
+

Stream state

+
Loading...
+ + +
+ +
+

Produce events

+

Events are appended with XADD with an approximate + MAXLEN ~ __MAXLEN__ retention cap.

+ + + + + +
+ +
+

Replay range (XRANGE)

+

Reads a slice of history. Replay is independent of any + consumer group — no cursors move, no acks happen.

+ + + + + + + +
+ +
+

Trim retention (XTRIM)

+

Cap the stream length. Approximate trimming releases whole + macro-nodes, which is much cheaper than exact trimming.

+ + + +
+ +
+

Consumer groups

+
Loading...
+
+ +
+

Pending entries (XPENDING)

+

Entries delivered to a consumer that haven't been acked yet. + Idle time ≥ __CLAIM_IDLE__ ms is eligible for + XAUTOCLAIM.

+
Loading...
+
+ + +
+
+ +
+

Last result

+

Produce events, replay a range, or trigger an autoclaim to see results.

+
+
+ +
+
+ + + + +"""; +} diff --git a/content/develop/use-cases/streaming/java-jedis/EventStream.java b/content/develop/use-cases/streaming/java-jedis/EventStream.java new file mode 100644 index 0000000000..c0d397a828 --- /dev/null +++ b/content/develop/use-cases/streaming/java-jedis/EventStream.java @@ -0,0 +1,671 @@ +import java.util.ArrayList; +import java.util.Collections; +import java.util.LinkedHashMap; +import java.util.List; +import java.util.Map; + +import redis.clients.jedis.Jedis; +import redis.clients.jedis.JedisPool; +import redis.clients.jedis.Pipeline; +import redis.clients.jedis.Response; +import redis.clients.jedis.StreamEntryID; +import redis.clients.jedis.commands.ProtocolCommand; +import redis.clients.jedis.exceptions.JedisDataException; +import redis.clients.jedis.params.XAddParams; +import redis.clients.jedis.params.XAutoClaimParams; +import redis.clients.jedis.params.XClaimParams; +import redis.clients.jedis.params.XPendingParams; +import redis.clients.jedis.params.XReadGroupParams; +import redis.clients.jedis.params.XTrimParams; +import redis.clients.jedis.resps.StreamConsumerInfo; +import redis.clients.jedis.resps.StreamEntry; +import redis.clients.jedis.resps.StreamGroupInfo; +import redis.clients.jedis.resps.StreamInfo; +import redis.clients.jedis.resps.StreamPendingEntry; +import redis.clients.jedis.util.SafeEncoder; + +/** + * Redis event-stream helper backed by a single Redis Stream. + * + *

Producers append events with {@code XADD}. Consumers belong to + * consumer groups and read with {@code XREADGROUP}. The group as a whole + * tracks a single {@code last-delivered-id} cursor, and each consumer + * gets its own pending-entries list (PEL) of in-flight messages it has + * been handed. Once a consumer has processed an entry it acknowledges + * it with {@code XACK}; entries left unacknowledged past an idle + * threshold can be swept to a healthy consumer with {@code XAUTOCLAIM} + * (or to a specific one with {@code XCLAIM}).

+ * + *

Each {@code XADD} carries an approximate {@code MAXLEN} so the + * stream stays bounded as it rolls forward. {@code XRANGE} supports + * replay over the retained history for debugging, audit, or rebuilding + * a downstream projection. Note that approximate trimming can release + * entries that are still in a group's PEL: those entries appear in + * {@code XAUTOCLAIM}'s deleted-IDs list, which the caller should log + * and route to a dead-letter store. Redis 7+ removes them from the PEL + * inside the {@code XAUTOCLAIM} call itself, so no explicit + * {@code XACK} is needed.

+ * + *

The same stream can be read by any number of consumer groups: each + * group has its own cursor and its own pending lists, so analytics, + * notifications, and audit can all process the full event flow at their + * own pace without coordinating with each other.

+ */ +public class EventStream { + + public static final String DEFAULT_STREAM_KEY = "demo:events:orders"; + public static final int DEFAULT_MAXLEN_APPROX = 10_000; + public static final int DEFAULT_CLAIM_MIN_IDLE_MS = 15_000; + + private final JedisPool pool; + private final String streamKey; + private final int maxlenApprox; + private final int claimMinIdleMs; + + private final Object statsLock = new Object(); + private long producedTotal; + private long ackedTotal; + private long claimedTotal; + + public EventStream(JedisPool pool) { + this(pool, DEFAULT_STREAM_KEY, DEFAULT_MAXLEN_APPROX, DEFAULT_CLAIM_MIN_IDLE_MS); + } + + public EventStream(JedisPool pool, String streamKey, int maxlenApprox, int claimMinIdleMs) { + if (pool == null) { + throw new IllegalArgumentException("pool is required"); + } + this.pool = pool; + this.streamKey = (streamKey == null || streamKey.isEmpty()) + ? DEFAULT_STREAM_KEY : streamKey; + this.maxlenApprox = maxlenApprox > 0 ? maxlenApprox : DEFAULT_MAXLEN_APPROX; + this.claimMinIdleMs = claimMinIdleMs >= 0 ? claimMinIdleMs : DEFAULT_CLAIM_MIN_IDLE_MS; + } + + public String getStreamKey() { + return streamKey; + } + + public int getMaxlenApprox() { + return maxlenApprox; + } + + public int getClaimMinIdleMs() { + return claimMinIdleMs; + } + + /** A single stream entry: id plus its flat field/value map. */ + public static final class Entry { + public final String id; + public final Map fields; + + public Entry(String id, Map fields) { + this.id = id; + this.fields = fields; + } + } + + /** Result of one {@link #autoclaim} sweep across the PEL. */ + public static final class AutoClaimResult { + public final List claimed; + public final List deletedIds; + + public AutoClaimResult(List claimed, List deletedIds) { + this.claimed = claimed; + this.deletedIds = deletedIds; + } + } + + // ------------------------------------------------------------------ + // Producer + // ------------------------------------------------------------------ + + /** Append a single event. Returns the stream ID Redis assigned. */ + public String produce(String eventType, Map payload) { + List>> events = new ArrayList<>(1); + events.add(new java.util.AbstractMap.SimpleEntry<>(eventType, payload)); + return produceBatch(events).get(0); + } + + /** + * Pipeline several {@code XADD} calls in one round trip. + * + *

Each entry carries an approximate {@code MAXLEN} cap. The + * {@code ~} flavour lets Redis trim at a macro-node boundary, which + * is much cheaper than exact trimming and is the right call for a + * retention guardrail rather than a hard size limit.

+ */ + public List produceBatch(List>> events) { + List ids = new ArrayList<>(events.size()); + try (Jedis jedis = pool.getResource()) { + Pipeline pipe = jedis.pipelined(); + List> responses = new ArrayList<>(events.size()); + for (Map.Entry> event : events) { + Map fields = encodeFields(event.getKey(), event.getValue()); + XAddParams params = new XAddParams() + .maxLen(maxlenApprox) + .approximateTrimming(); + responses.add(pipe.xadd(streamKey, params, fields)); + } + pipe.sync(); + for (Response resp : responses) { + StreamEntryID id = resp.get(); + ids.add(id == null ? null : id.toString()); + } + } + synchronized (statsLock) { + producedTotal += ids.size(); + } + return ids; + } + + private static Map encodeFields(String eventType, Map payload) { + Map fields = new LinkedHashMap<>(); + fields.put("type", eventType); + fields.put("ts_ms", Long.toString(System.currentTimeMillis())); + if (payload != null) { + for (Map.Entry e : payload.entrySet()) { + String key = e.getKey(); + String value = e.getValue(); + fields.put(key, value == null ? "" : value); + } + } + return fields; + } + + // ------------------------------------------------------------------ + // Consumer groups + // ------------------------------------------------------------------ + + /** + * Create the consumer group if it doesn't exist. + * + *

{@code $} means "deliver only events appended after this point"; + * pass {@code 0-0} to replay the entire stream into a fresh group. + * {@code BUSYGROUP} errors (group already exists) are swallowed.

+ */ + public void ensureGroup(String group, String startId) { + try (Jedis jedis = pool.getResource()) { + jedis.xgroupCreate(streamKey, group, new StreamEntryID(startId), true); + } catch (JedisDataException exc) { + if (exc.getMessage() == null || !exc.getMessage().contains("BUSYGROUP")) { + throw exc; + } + } + } + + public long deleteGroup(String group) { + try (Jedis jedis = pool.getResource()) { + return jedis.xgroupDestroy(streamKey, group); + } catch (JedisDataException exc) { + return 0L; + } + } + + /** + * Read new entries for this consumer via {@code XREADGROUP}. + * + *

The {@code >} ID means "deliver entries this consumer group has + * not delivered to anyone yet" — the at-least-once path. + * Replaying an explicit ID instead would re-deliver entries already + * in this consumer's pending list (see {@link #consumeOwnPel} for + * that recovery path).

+ */ + public List consume(String group, String consumer, int count, long blockMs) { + try (Jedis jedis = pool.getResource()) { + XReadGroupParams params = new XReadGroupParams() + .count(count) + .block((int) blockMs); + Map streams = new LinkedHashMap<>(); + // ``UNRECEIVED_ENTRY`` is the Jedis sentinel that serialises + // to the special ID ``>``: "deliver entries this group has + // not yet delivered to anyone". Same field name across 5.x + // and 6.x; 6.x also exposes it as ``XREADGROUP_UNDELIVERED_ENTRY``. + streams.put(streamKey, StreamEntryID.UNRECEIVED_ENTRY); + List>> result = + jedis.xreadGroup(group, consumer, params, streams); + return flattenEntries(result); + } + } + + /** + * Re-deliver entries already in this consumer's PEL. + * + *

Reading with an explicit ID ({@code 0-0}) instead of {@code >} + * replays the entries already assigned to this consumer name + * without advancing the group's {@code last-delivered-id}. This is + * the canonical recovery path after a crash on the same consumer + * name, and is also how a consumer picks up entries that another + * consumer (or {@code XAUTOCLAIM}) handed to it.

+ */ + public List consumeOwnPel(String group, String consumer, int count) { + try (Jedis jedis = pool.getResource()) { + XReadGroupParams params = new XReadGroupParams().count(count); + Map streams = new LinkedHashMap<>(); + streams.put(streamKey, new StreamEntryID(0L, 0L)); + List>> result = + jedis.xreadGroup(group, consumer, params, streams); + return flattenEntries(result); + } + } + + public long ack(String group, List ids) { + if (ids == null || ids.isEmpty()) { + return 0L; + } + StreamEntryID[] entryIds = new StreamEntryID[ids.size()]; + for (int i = 0; i < ids.size(); i++) { + entryIds[i] = new StreamEntryID(ids.get(i)); + } + long acked; + try (Jedis jedis = pool.getResource()) { + acked = jedis.xack(streamKey, group, entryIds); + } + synchronized (statsLock) { + ackedTotal += acked; + } + return acked; + } + + /** + * Sweep idle pending entries to {@code consumer}. + * + *

A single {@code XAUTOCLAIM} call scans up to {@code pageCount} + * PEL entries starting at {@code startId} and returns a continuation + * cursor. For a full sweep of the PEL, loop until the cursor returns + * to {@code 0-0} (or hit {@code maxPages} as a safety net so a very + * large PEL cannot monopolise the call).

+ * + *

The {@code deletedIds} list contains PEL entries whose stream + * payload had already been trimmed by the time this sweep ran + * (typically because {@code MAXLEN ~} retention outran a slow + * consumer). {@code XAUTOCLAIM} removes those dangling slots from + * the PEL itself — the caller does not need to + * {@code XACK} them — but they cannot be retried, so log and route + * them to a dead-letter store for observability.

+ */ + public AutoClaimResult autoclaim( + String group, String consumer, int pageCount, String startId, int maxPages) { + List claimedAll = new ArrayList<>(); + List deletedAll = new ArrayList<>(); + String cursor = (startId == null || startId.isEmpty()) ? "0-0" : startId; + try (Jedis jedis = pool.getResource()) { + for (int page = 0; page < maxPages; page++) { + // Use sendCommand to get the raw 3-element reply + // (next-id, claimed-entries, deleted-ids). Jedis 6's + // typed xautoclaim wrapper hides the deleted-ids slot. + Object raw = jedis.sendCommand( + XAutoClaimRaw.XAUTOCLAIM, + streamKey, group, consumer, + Integer.toString(claimMinIdleMs), + cursor, + "COUNT", Integer.toString(pageCount)); + ParsedAutoClaim parsed = parseAutoClaim(raw); + claimedAll.addAll(parsed.claimed); + deletedAll.addAll(parsed.deletedIds); + if ("0-0".equals(parsed.nextCursor)) { + break; + } + cursor = parsed.nextCursor; + } + } + synchronized (statsLock) { + claimedTotal += claimedAll.size(); + } + return new AutoClaimResult(claimedAll, deletedAll); + } + + private enum XAutoClaimRaw implements ProtocolCommand { + XAUTOCLAIM; + + @Override + public byte[] getRaw() { + return SafeEncoder.encode("XAUTOCLAIM"); + } + } + + private static final class ParsedAutoClaim { + final String nextCursor; + final List claimed; + final List deletedIds; + + ParsedAutoClaim(String nextCursor, List claimed, List deletedIds) { + this.nextCursor = nextCursor; + this.claimed = claimed; + this.deletedIds = deletedIds; + } + } + + @SuppressWarnings("unchecked") + private static ParsedAutoClaim parseAutoClaim(Object raw) { + if (!(raw instanceof List)) { + return new ParsedAutoClaim("0-0", Collections.emptyList(), Collections.emptyList()); + } + List arr = (List) raw; + String nextCursor = decodeString(arr.get(0)); + List entriesRaw = arr.size() > 1 && arr.get(1) instanceof List + ? (List) arr.get(1) : Collections.emptyList(); + List deletedRaw = arr.size() > 2 && arr.get(2) instanceof List + ? (List) arr.get(2) : Collections.emptyList(); + + List claimed = new ArrayList<>(entriesRaw.size()); + for (Object item : entriesRaw) { + if (!(item instanceof List)) { + continue; + } + List entry = (List) item; + if (entry.size() < 2) { + continue; + } + String id = decodeString(entry.get(0)); + Map fields = decodeFieldArray(entry.get(1)); + claimed.add(new Entry(id, fields)); + } + List deleted = new ArrayList<>(deletedRaw.size()); + for (Object item : deletedRaw) { + deleted.add(decodeString(item)); + } + return new ParsedAutoClaim(nextCursor, claimed, deleted); + } + + @SuppressWarnings("unchecked") + private static Map decodeFieldArray(Object raw) { + Map fields = new LinkedHashMap<>(); + if (!(raw instanceof List)) { + return fields; + } + List arr = (List) raw; + for (int i = 0; i + 1 < arr.size(); i += 2) { + String key = decodeString(arr.get(i)); + String value = decodeString(arr.get(i + 1)); + fields.put(key, value == null ? "" : value); + } + return fields; + } + + private static String decodeString(Object raw) { + if (raw == null) { + return null; + } + if (raw instanceof byte[]) { + return SafeEncoder.encode((byte[]) raw); + } + return raw.toString(); + } + + /** + * Drop a consumer from a group. + * + *

{@code XGROUP DELCONSUMER} destroys this consumer's PEL entries + * — any entry it still owned is no longer tracked anywhere in the + * group, and {@code XAUTOCLAIM} will never find it again. Always + * {@link #handoverPending} (or {@code XCLAIM} it manually) to a + * healthy consumer first; this method is the raw destructive call + * and is exposed only for explicit cleanup.

+ */ + public long deleteConsumer(String group, String consumer) { + try (Jedis jedis = pool.getResource()) { + return jedis.xgroupDelConsumer(streamKey, group, consumer); + } catch (JedisDataException exc) { + return 0L; + } + } + + /** + * Move every PEL entry owned by {@code fromConsumer} to + * {@code toConsumer}. + * + *

Enumerates the source consumer's PEL with {@code XPENDING ... + * CONSUMER} and reassigns each ID with {@code XCLAIM} at zero idle + * time so the move is unconditional. ({@code XAUTOCLAIM} does not + * filter by source consumer, so it cannot be used for a per-consumer + * handover.)

+ * + *

Call this before {@link #deleteConsumer} whenever the source + * still has pending entries — otherwise {@code XGROUP DELCONSUMER} + * would silently destroy them and they could never be recovered.

+ */ + public int handoverPending(String group, String fromConsumer, String toConsumer, int batch) { + int total = 0; + try (Jedis jedis = pool.getResource()) { + while (true) { + XPendingParams params = new XPendingParams() + .idle(0L) + .count(batch) + .consumer(fromConsumer); + List rows = jedis.xpending(streamKey, group, params); + if (rows == null || rows.isEmpty()) { + break; + } + StreamEntryID[] ids = new StreamEntryID[rows.size()]; + for (int i = 0; i < rows.size(); i++) { + ids[i] = rows.get(i).getID(); + } + XClaimParams claimParams = new XClaimParams(); + List claimed = jedis.xclaim( + streamKey, group, toConsumer, 0L, claimParams, ids); + total += (claimed == null ? 0 : claimed.size()); + if (rows.size() < batch) { + break; + } + } + } + synchronized (statsLock) { + claimedTotal += total; + } + return total; + } + + // ------------------------------------------------------------------ + // Replay, length, trim + // ------------------------------------------------------------------ + + /** + * Range read with {@code XRANGE} for replay or audit. + * + *

Read-only: ranges do not update any group cursor and do not ack + * anything. Useful for bootstrapping a new projection, for building + * an audit view, or for debugging what actually went through the + * stream.

+ */ + public List replay(String startId, String endId, int count) { + try (Jedis jedis = pool.getResource()) { + List rows = jedis.xrange(streamKey, startId, endId, count); + return toEntries(rows); + } + } + + public long length() { + try (Jedis jedis = pool.getResource()) { + return jedis.xlen(streamKey); + } catch (JedisDataException exc) { + return 0L; + } + } + + public long trimMaxlen(long maxlen) { + try (Jedis jedis = pool.getResource()) { + XTrimParams params = new XTrimParams().maxLen(maxlen).approximateTrimming(); + return jedis.xtrim(streamKey, params); + } catch (JedisDataException exc) { + return 0L; + } + } + + public long trimMinid(String minid) { + try (Jedis jedis = pool.getResource()) { + XTrimParams params = new XTrimParams().minId(minid).approximateTrimming(); + return jedis.xtrim(streamKey, params); + } catch (JedisDataException exc) { + return 0L; + } + } + + // ------------------------------------------------------------------ + // Inspection + // ------------------------------------------------------------------ + + /** Subset of {@code XINFO STREAM} that's safe to JSON-encode. */ + public Map infoStream() { + Map info = new LinkedHashMap<>(); + info.put("length", 0L); + info.put("last_generated_id", null); + info.put("first_entry_id", null); + info.put("last_entry_id", null); + try (Jedis jedis = pool.getResource()) { + StreamInfo raw = jedis.xinfoStream(streamKey); + if (raw == null) { + return info; + } + info.put("length", raw.getLength()); + info.put("last_generated_id", + raw.getLastGeneratedId() == null ? null : raw.getLastGeneratedId().toString()); + StreamEntry first = raw.getFirstEntry(); + StreamEntry last = raw.getLastEntry(); + info.put("first_entry_id", first == null ? null : first.getID().toString()); + info.put("last_entry_id", last == null ? null : last.getID().toString()); + } catch (JedisDataException exc) { + // Key does not exist yet — return the default empty info. + } + return info; + } + + public List> infoGroups() { + List> out = new ArrayList<>(); + try (Jedis jedis = pool.getResource()) { + List rows = jedis.xinfoGroups(streamKey); + if (rows == null) { + return out; + } + for (StreamGroupInfo row : rows) { + Map g = new LinkedHashMap<>(); + g.put("name", row.getName()); + g.put("consumers", row.getConsumers()); + g.put("pending", row.getPending()); + g.put("last_delivered_id", + row.getLastDeliveredId() == null ? null : row.getLastDeliveredId().toString()); + Map rawInfo = row.getGroupInfo(); + Object lag = rawInfo == null ? null : rawInfo.get("lag"); + g.put("lag", lag instanceof Number ? ((Number) lag).longValue() : null); + out.add(g); + } + } catch (JedisDataException exc) { + // Stream key not present. + } + return out; + } + + public List> infoConsumers(String group) { + List> out = new ArrayList<>(); + try (Jedis jedis = pool.getResource()) { + List rows = jedis.xinfoConsumers2(streamKey, group); + if (rows == null) { + return out; + } + for (StreamConsumerInfo row : rows) { + Map c = new LinkedHashMap<>(); + c.put("name", row.getName()); + c.put("pending", (long) row.getPending()); + c.put("idle_ms", row.getIdle()); + out.add(c); + } + } catch (JedisDataException exc) { + // Group does not exist. + } + return out; + } + + /** Per-entry PEL view (id, consumer, idle, deliveries). */ + public List> pendingDetail(String group, int count) { + List> out = new ArrayList<>(); + try (Jedis jedis = pool.getResource()) { + XPendingParams params = new XPendingParams() + .start(StreamEntryID.MINIMUM_ID) + .end(StreamEntryID.MAXIMUM_ID) + .count(count); + List rows = jedis.xpending(streamKey, group, params); + if (rows == null) { + return out; + } + for (StreamPendingEntry row : rows) { + Map p = new LinkedHashMap<>(); + p.put("id", row.getID().toString()); + p.put("consumer", row.getConsumerName()); + p.put("idle_ms", row.getIdleTime()); + p.put("deliveries", row.getDeliveredTimes()); + out.add(p); + } + } catch (JedisDataException exc) { + // Group does not exist or stream missing. + } + return out; + } + + /** Reverse-range tail read, used by the demo to render the most recent entries. */ + public List tail(int count) { + try (Jedis jedis = pool.getResource()) { + List rows = jedis.xrevrange(streamKey, "+", "-", count); + return toEntries(rows); + } catch (JedisDataException exc) { + return new ArrayList<>(); + } + } + + public Map stats() { + Map snapshot = new LinkedHashMap<>(); + synchronized (statsLock) { + snapshot.put("produced_total", producedTotal); + snapshot.put("acked_total", ackedTotal); + snapshot.put("claimed_total", claimedTotal); + } + return snapshot; + } + + public void resetStats() { + synchronized (statsLock) { + producedTotal = 0L; + ackedTotal = 0L; + claimedTotal = 0L; + } + } + + // ------------------------------------------------------------------ + // Demo housekeeping + // ------------------------------------------------------------------ + + /** Drop the stream key entirely. Used by the demo's reset path. */ + public void deleteStream() { + try (Jedis jedis = pool.getResource()) { + jedis.del(streamKey); + } + } + + // ------------------------------------------------------------------ + // Helpers + // ------------------------------------------------------------------ + + private static List flattenEntries( + List>> raw) { + List out = new ArrayList<>(); + if (raw == null) { + return out; + } + for (Map.Entry> stream : raw) { + for (StreamEntry entry : stream.getValue()) { + out.add(new Entry(entry.getID().toString(), entry.getFields())); + } + } + return out; + } + + private static List toEntries(List rows) { + List out = new ArrayList<>(); + if (rows == null) { + return out; + } + for (StreamEntry row : rows) { + out.add(new Entry(row.getID().toString(), row.getFields())); + } + return out; + } +} diff --git a/content/develop/use-cases/streaming/java-jedis/_index.md b/content/develop/use-cases/streaming/java-jedis/_index.md new file mode 100644 index 0000000000..5964eb6fba --- /dev/null +++ b/content/develop/use-cases/streaming/java-jedis/_index.md @@ -0,0 +1,502 @@ +--- +categories: +- docs +- develop +- stack +- oss +- rs +- rc +description: Implement a Redis event-streaming pipeline in Java with Jedis +linkTitle: Jedis example (Java) +title: Redis streaming with Jedis +weight: 4 +--- + +This guide shows you how to build a Redis-backed event-streaming pipeline in Java with the [Jedis]({{< relref "/develop/clients/jedis" >}}) client library. It includes a small local web server built on the JDK's `com.sun.net.httpserver` so you can produce events into a single Redis Stream, watch two independent consumer groups read it at their own pace, and recover stuck deliveries with `XAUTOCLAIM` after simulating a consumer crash. + +## Overview + +A Redis Stream is an append-only log of field/value entries with auto-generated, time-ordered IDs. Producers append with [`XADD`]({{< relref "/commands/xadd" >}}); consumers belong to *consumer groups* and read with [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}). The group as a whole tracks a single `last-delivered-id` cursor, and each consumer gets its own pending-entries list (PEL) of messages it has been handed but not yet acknowledged. Once a consumer has processed an entry it calls [`XACK`]({{< relref "/commands/xack" >}}) to clear the entry from its PEL; entries left unacknowledged past an idle threshold can be reassigned to a healthy consumer with [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}). + +That gives you: + +* Ordered, durable history that many independent consumer groups can read at their own pace +* At-least-once delivery, with per-consumer pending lists and automatic recovery of crashed consumers +* Horizontal scaling within a group — add a consumer and Redis automatically splits the work +* Replay of any range with [`XRANGE`]({{< relref "/commands/xrange" >}}), independent of consumer-group state +* Bounded retention through [`XADD MAXLEN ~`]({{< relref "/commands/xadd" >}}) or + [`XTRIM MINID ~`]({{< relref "/commands/xtrim" >}}), without a separate cleanup job + +In this example, producers append order events (`order.placed`, `order.paid`, `order.shipped`, `order.cancelled`) to a single stream at `demo:events:orders`. Two consumer groups read the same stream: + +* **`notifications`** — two consumers (`worker-a`, `worker-b`) sharing the work, modelling a fan-out worker pool. +* **`analytics`** — one consumer (`worker-c`) processing the full event flow on its own. + +## How it works + +The flow looks like this: + +1. The application calls `stream.produce(eventType, payload)` which runs [`XADD`]({{< relref "/commands/xadd" >}}) with an approximate [`MAXLEN ~`]({{< relref "/commands/xadd" >}}) cap. Redis assigns an auto-generated time-ordered ID. +2. Each consumer thread loops on [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}) with the special ID `>` (meaning "deliver entries this group has not yet delivered to anyone") and a short block timeout. +3. After processing each entry, the consumer calls [`XACK`]({{< relref "/commands/xack" >}}) so Redis can drop it from the group's pending list. +4. If a consumer is killed (or crashes) before acking, its entries sit in the group's PEL. A periodic [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}) sweep reassigns idle entries to a healthy consumer. +5. Anyone — including code outside the consumer groups — can read history with [`XRANGE`]({{< relref "/commands/xrange" >}}) without affecting any group's cursor. + +Each consumer group has its own cursor (`last-delivered-id`) and its own pending list, so the two groups in this demo process the same events without coordinating with each other. + +## The event-stream helper + +The `EventStream` class wraps the stream operations +([source](https://github.com/redis/docs/blob/main/content/develop/use-cases/streaming/java-jedis/EventStream.java)): + +```java +import redis.clients.jedis.JedisPool; +import redis.clients.jedis.JedisPoolConfig; + +JedisPool pool = new JedisPool(new JedisPoolConfig(), "localhost", 6379); +EventStream stream = new EventStream( + pool, + "demo:events:orders", + 2000, // approximate MAXLEN retention guardrail + 5000); // XAUTOCLAIM idle threshold (ms) + +// Producer +Map payload = new LinkedHashMap<>(); +payload.put("order_id", "o-1234"); +payload.put("customer", "alice"); +payload.put("amount", "49.50"); +String streamId = stream.produce("order.placed", payload); + +// Consumer group + one consumer +stream.ensureGroup("notifications", "0-0"); +List entries = + stream.consume("notifications", "worker-a", 10, 500L); +for (EventStream.Entry entry : entries) { + handle(entry.fields); // your processing + stream.ack("notifications", List.of(entry.id)); // XACK +} + +// Recover stuck PEL entries by reaping them into a healthy consumer. +// The textbook pattern: each consumer periodically calls XAUTOCLAIM +// with itself as the target and processes whatever it claimed. +// ConsumerWorker.reapIdlePel wraps that flow; the low-level helper +// stream.autoclaim(group, target, ...) is also available if you want +// to drive XAUTOCLAIM directly. +ConsumerWorker.ReapResult result = workerB.reapIdlePel(); +// result.claimed == number of entries pulled into this consumer's PEL +// result.processed == number that were handled + acked +// result.deletedIds == PEL entries whose payload was already trimmed. +// Redis 7+ has already removed those slots from the PEL, so no XACK is +// needed — log them and route to a dead-letter store for audit. + +// Replay history (independent of any group's cursor) +for (EventStream.Entry entry : stream.replay("-", "+", 50)) { + System.out.println(entry.id + " " + entry.fields); +} +``` + +### Data model + +Each event is a single stream entry — a flat map of field/value strings — with an auto-generated time-ordered ID: + +```text +demo:events:orders + 1716998413541-0 type=order.placed order_id=o-1234 customer=alice amount=49.50 ts_ms=... + 1716998413542-0 type=order.paid order_id=o-1234 customer=alice amount=49.50 ts_ms=... + 1716998413542-1 type=order.shipped order_id=o-1235 customer=bob amount=12.00 ts_ms=... + ... +``` + +The ID is `{milliseconds}-{sequence}`, monotonically increasing within the stream, so you can range-query by approximate wall-clock time without an extra index. (IDs are ordered within a stream, not across streams — two events appended to different streams at the same millisecond can produce the same ID.) The implementation uses: + +* [`XADD ... MAXLEN ~ n`]({{< relref "/commands/xadd" >}}), pipelined, for batch production with a retention cap +* [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}) with the special ID `>` for fresh deliveries to a consumer +* [`XACK`]({{< relref "/commands/xack" >}}) on every processed entry +* [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}) for sweeping idle pending entries to a healthy consumer +* [`XRANGE`]({{< relref "/commands/xrange" >}}) for replay and audit +* [`XPENDING`]({{< relref "/commands/xpending" >}}) for inspecting the per-group pending list +* [`XINFO STREAM`]({{< relref "/commands/xinfo-stream" >}}), + [`XINFO GROUPS`]({{< relref "/commands/xinfo-groups" >}}), and + [`XINFO CONSUMERS`]({{< relref "/commands/xinfo-consumers" >}}) for surface-level observability +* [`XTRIM`]({{< relref "/commands/xtrim" >}}) for explicit retention enforcement + +## Producing events + +`produceBatch` pipelines `XADD` calls in a single round trip. Each call carries an approximate `MAXLEN ~` cap so the stream stays bounded as it rolls forward: + +```java +public List produceBatch(List>> events) { + List ids = new ArrayList<>(events.size()); + try (Jedis jedis = pool.getResource()) { + Pipeline pipe = jedis.pipelined(); + List> responses = new ArrayList<>(events.size()); + for (Map.Entry> event : events) { + Map fields = encodeFields(event.getKey(), event.getValue()); + XAddParams params = new XAddParams() + .maxLen(maxlenApprox) + .approximateTrimming(); + responses.add(pipe.xadd(streamKey, params, fields)); + } + pipe.sync(); + for (Response resp : responses) { + StreamEntryID id = resp.get(); + ids.add(id == null ? null : id.toString()); + } + } + return ids; +} +``` + +The `~` flavour of `MAXLEN` lets Redis trim at a macro-node boundary, which is much cheaper than exact trimming and is what you want when the cap is a retention *guardrail*, not a hard size constraint. With 300 events produced and `MAXLEN ~ 50`, you might end up with 100 entries left — Redis released the oldest whole macro-node and stopped. The next `XADD` will keep length stable. + +If you genuinely need an exact cap (rare), drop the `.approximateTrimming()` call. The performance difference is significant on busy streams. + +## Reading with a consumer group + +Each consumer in a group runs the same `XREADGROUP` loop. The special ID `>` means "deliver entries this group has not yet delivered to *anyone*": + +```java +public List consume(String group, String consumer, int count, long blockMs) { + try (Jedis jedis = pool.getResource()) { + XReadGroupParams params = new XReadGroupParams() + .count(count) + .block((int) blockMs); + Map streams = new LinkedHashMap<>(); + streams.put(streamKey, StreamEntryID.UNRECEIVED_ENTRY); + List>> result = + jedis.xreadGroup(group, consumer, params, streams); + return flattenEntries(result); + } +} +``` + +`blockMs` makes the call efficient even when the stream is idle: the client parks on the server until either an entry arrives or the timeout expires, so consumers don't busy-loop. The Jedis instance is acquired from a `JedisPool` with try-with-resources, so the blocking call holds *its* connection for the duration of the block but other handlers and workers can grab their own connections in parallel. + +Reading with an explicit ID like `0-0` instead of `>` does something different — it replays entries already delivered to *this* consumer name (its private PEL). That is the canonical recovery path when the same consumer restarts: catch up on its own pending entries first, then resume reading new ones (`consumeOwnPel` exposes that read). + +## Acknowledging entries + +Once the consumer has processed an entry, `XACK` tells Redis it can drop the entry from the group's pending list: + +```java +public long ack(String group, List ids) { + if (ids == null || ids.isEmpty()) { + return 0L; + } + StreamEntryID[] entryIds = new StreamEntryID[ids.size()]; + for (int i = 0; i < ids.size(); i++) { + entryIds[i] = new StreamEntryID(ids.get(i)); + } + try (Jedis jedis = pool.getResource()) { + return jedis.xack(streamKey, group, entryIds); + } +} +``` + +This is the linchpin of at-least-once delivery: an entry that is never acked stays in the PEL until a claim moves it elsewhere. If your consumer thread crashes between processing and ack, the next claim sweep picks the entry back up. The one caveat is retention: `XADD MAXLEN ~` and `XTRIM` can release the entry's *payload* even while its ID is still in the PEL. The next `XAUTOCLAIM` returns those IDs in its `deletedIds` list and removes them from the PEL inside the same command — the entry cannot be retried, so the caller should log it and route to a dead-letter store for audit. The example handles this explicitly in `reapIdlePel` further down. + +The trade-off is the opposite of pub/sub: a slow or crashed consumer doesn't lose messages, but it does mean your downstream system must be idempotent. If you process an order twice because the first attempt died after the side effect but before the ack, the second attempt must be safe. + +## Multiple consumer groups, one stream + +The big difference between Redis Streams and a job queue is that any number of independent consumer groups can read the same stream. The demo sets up two groups on `demo:events:orders`: + +```java +stream.ensureGroup("notifications", "0-0"); +stream.ensureGroup("analytics", "0-0"); +``` + +Each group has its own cursor. Producing 5 events results in `notifications` and `analytics` each receiving all 5, with no coordination between them. Within `notifications`, the work is split across `worker-a` and `worker-b`: Redis hands each `XREADGROUP` call whatever entries are not yet delivered to anyone in the group, so adding a second worker doubles throughput without any rebalance logic. + +The `"0-0"` argument means "deliver everything in the stream from the beginning" — useful in a demo and for fresh groups bootstrapping from history. In production, a brand-new group reading a long-existing stream usually starts at `$` ("only events after this point") and uses [`XRANGE`]({{< relref "/commands/xrange" >}}) explicitly if it needs history. + +## Recovering crashed consumers with XAUTOCLAIM + +The demo's "Crash next 3" button tells a chosen consumer to drop its next three deliveries on the floor without acking them — the same effect as a worker process dying mid-message. Those entries stay in the group's PEL with their delivery counter incremented. Once they have been idle for at least `claimMinIdleMs`, any healthy consumer in the group can rescue them by calling `XAUTOCLAIM` *with itself as the target*. `ConsumerWorker.reapIdlePel` wraps that pattern: + +```java +public ReapResult reapIdlePel() { + EventStream.AutoClaimResult result = stream.autoclaim(group, name, 100, "0-0", 10); + int processedCount = 0; + for (EventStream.Entry entry : result.claimed) { + try { + handleEntry(entry.id, entry.fields); + processedCount++; + } catch (Exception exc) { + System.err.printf( + "[%s/%s] reap failed on %s: %s%n", + group, name, entry.id, exc); + } + } + return new ReapResult( + result.claimed.size(), processedCount, result.deletedIds); +} +``` + +The underlying `stream.autoclaim` helper pages through the group's PEL with `XAUTOCLAIM`'s continuation cursor: + +```java +public AutoClaimResult autoclaim( + String group, String consumer, int pageCount, String startId, int maxPages) { + List claimedAll = new ArrayList<>(); + List deletedAll = new ArrayList<>(); + String cursor = (startId == null || startId.isEmpty()) ? "0-0" : startId; + try (Jedis jedis = pool.getResource()) { + for (int page = 0; page < maxPages; page++) { + // Use sendCommand for the raw 3-element XAUTOCLAIM reply + // (next-id, claimed-entries, deleted-ids). Jedis 6's typed + // xautoclaim wrapper returns only the first two slots. + Object raw = jedis.sendCommand( + XAutoClaimRaw.XAUTOCLAIM, + streamKey, group, consumer, + Integer.toString(claimMinIdleMs), + cursor, + "COUNT", Integer.toString(pageCount)); + ParsedAutoClaim parsed = parseAutoClaim(raw); + claimedAll.addAll(parsed.claimed); + deletedAll.addAll(parsed.deletedIds); + if ("0-0".equals(parsed.nextCursor)) { + break; + } + cursor = parsed.nextCursor; + } + } + return new AutoClaimResult(claimedAll, deletedAll); +} +``` + +A single `XAUTOCLAIM` call scans up to `pageCount` PEL entries starting at `startId`, reassigns the ones idle for at least `minIdleTime` to the named consumer, and returns a continuation cursor in the first slot of the reply. For a full sweep, loop until the cursor returns to `0-0` (with a `maxPages` safety net so one call cannot monopolise a very large PEL). The delivery counter is incremented on every claim — after a few cycles you can use it to spot a *poison-pill* message that crashes every consumer that touches it, and route it to a dead-letter stream so the bad entry stops cycling. (New entries keep flowing past the poison pill — `XREADGROUP >` still delivers fresh work — but the bad entry's repeated reclaim wastes consumer time and keeps the PEL larger than it needs to be.) + +The `deletedIds` list contains PEL entry IDs whose stream payload was already trimmed by the time the claim ran (typically because `MAXLEN ~` retention outran a slow consumer). `XAUTOCLAIM` removes those dangling slots from the PEL itself, so the caller does *not* need to `XACK` them — but the entries cannot be retried either, so log and route them to a dead-letter store for offline inspection. Redis 7.0 introduced this third return element; the example requires Redis 7.0+ for that reason. + +Jedis 6.x's typed `jedis.xautoclaim(...)` overload returns only the cursor and the claimed entries (as `Map.Entry>`) — the deleted-IDs slot is dropped. To surface it, the helper sends `XAUTOCLAIM` through `Jedis.sendCommand` and parses all three reply elements by hand. + +`reapIdlePel` is the right primitive for the recovery path because it claims and processes in one step: every entry the call returned is now in *this* consumer's PEL, so the same consumer is responsible for processing and acking it. In production each consumer thread runs `reapIdlePel` periodically (every few seconds, on a timer) so a crashed peer's entries never sit invisibly. The demo exposes it as a manual button so you can trigger the reap after waiting for the idle threshold. + +`XCLAIM` (singular, no auto) does the same thing for a specific list of entry IDs you already have in hand — useful when you want to take ownership of one known stuck entry, or when you need to move a specific consumer's PEL to a peer (the case the demo's "Remove consumer" button handles via `handoverPending`). `XAUTOCLAIM` cannot filter by source consumer, so it cannot be used for a per-consumer handover. + +## Replay with XRANGE + +`XRANGE` reads a slice of history. It is completely independent of any consumer group — no cursors move, no acks happen — so it is safe to call any number of times, from any process: + +```java +public List replay(String startId, String endId, int count) { + try (Jedis jedis = pool.getResource()) { + List rows = jedis.xrange(streamKey, startId, endId, count); + return toEntries(rows); + } +} +``` + +The special IDs `-` and `+` mean "from the very beginning" and "to the very end". You can also pass real IDs (`1716998413541-0`) or just the millisecond part (`1716998413541`, which Redis interprets as "any entry with this timestamp"). + +Typical uses: + +* **Bootstrapping a new projection** — read the entire stream from `-` and build a derived view in another store (a search index, a SQL table, a different cache). Doing this against a consumer group would consume the entries; `XRANGE` lets you do it without disrupting live consumers. +* **Auditing recent activity** — read the last few minutes by ID range without touching any group cursor. +* **Debugging** — fetch one specific entry by its ID, or a tight range around an incident timestamp, to see exactly what producers wrote. + +## The consumer worker thread + +`ConsumerWorker` wraps the `XREADGROUP` → process → `XACK` loop in a daemon thread +([source](https://github.com/redis/docs/blob/main/content/develop/use-cases/streaming/java-jedis/ConsumerWorker.java)): + +```java +private void run() { + while (!stopRequested) { + if (paused) { + sleep(50L); + continue; + } + List entries; + try { + entries = stream.consume(group, name, 10, 500L); + } catch (Exception exc) { + System.err.printf("[%s/%s] read failed: %s%n", group, name, exc); + sleep(500L); + continue; + } + for (EventStream.Entry entry : entries) { + dispatch(entry.id, entry.fields); + } + } +} +``` + +`dispatch` calls `handleEntry`, which either acks (the normal path) or, when the demo has asked the worker to "crash", drops the entry on the floor and increments a counter so the UI can show what is currently in the PEL waiting to be claimed. A failure inside the per-entry handler (typically an `XACK` against Redis) is caught and logged so a single bad entry can never kill the daemon thread — that would silently halt this consumer while every other entry sat in its PEL waiting for `XAUTOCLAIM`. + +Recovery of stuck PEL entries — this consumer's, after a restart, or another consumer's, after a crash — runs through a separate `reapIdlePel` method rather than the read loop. That method calls `XAUTOCLAIM` with this consumer as the target, then processes whatever was claimed in the same flow as new entries. This is the textbook Streams pattern: each consumer is its own reaper, running `XAUTOCLAIM(self)` periodically (or on demand) so a crashed peer's entries never sit invisibly in the PEL. The demo's "XAUTOCLAIM to selected" button calls `reapIdlePel` on the chosen consumer; in production you would run it from a timer every few seconds. + +Note that the worker's main read loop deliberately does *not* call `XREADGROUP 0` to drain its own PEL on every iteration. That would re-deliver every pending entry continuously and *reset its idle counter to zero* each time, which would keep crashed entries below the `XAUTOCLAIM` threshold forever. Using `XAUTOCLAIM(self)` as the recovery primitive — which only fires for entries idle longer than `minIdleTime` — avoids that whole class of bug. + +The pause and crash levers exist only for the demo. A real consumer is just the read-process-ack loop — everything else in this class is instrumentation. + +## Prerequisites + +Before running the demo, make sure that: + +* Redis 7.0 or later is running and accessible. By default, the demo connects to `localhost:6379`. `XAUTOCLAIM` was added in Redis 6.2, but its reply gained a third element (the list of deleted IDs) in 7.0; the example relies on that shape. +* JDK 17 or later is installed (the demo's inline HTML uses text blocks, introduced as a preview in JDK 13 and finalised in JDK 15; 17+ keeps the demo on a current LTS). +* The Jedis JAR (5.0+; the example was tested against 6.2.0) and its dependencies are on your classpath. Get them from [Maven Central](https://repo1.maven.org/maven2/redis/clients/jedis/), or via Maven/Gradle in a project setup. You also need [`slf4j-api`](https://repo1.maven.org/maven2/org/slf4j/slf4j-api/) and [`commons-pool2`](https://repo1.maven.org/maven2/org/apache/commons/commons-pool2/), which Jedis declares as transitive dependencies. + +If your Redis server is running elsewhere, start the demo with `--redis-host` and `--redis-port`. + +## Running the demo + +### Get the source files + +The demo consists of three Java files. Download them from the [`java-jedis` source folder](https://github.com/redis/docs/tree/main/content/develop/use-cases/streaming/java-jedis) on GitHub, or grab them with `curl`: + +```bash +mkdir streaming-demo && cd streaming-demo +BASE=https://raw.githubusercontent.com/redis/docs/main/content/develop/use-cases/streaming/java-jedis +curl -O $BASE/EventStream.java +curl -O $BASE/ConsumerWorker.java +curl -O $BASE/DemoServer.java +``` + +Download the JARs into the same directory (the example assumes Jedis 6.2.0, `slf4j-api` 2.0.12, and `commons-pool2` 2.12.1 — adjust the filenames if you use different versions): + +```bash +JEDIS_VER=6.2.0 +SLF4J_VER=2.0.12 +POOL_VER=2.12.1 +curl -O https://repo1.maven.org/maven2/redis/clients/jedis/$JEDIS_VER/jedis-$JEDIS_VER.jar +curl -O https://repo1.maven.org/maven2/org/slf4j/slf4j-api/$SLF4J_VER/slf4j-api-$SLF4J_VER.jar +curl -O https://repo1.maven.org/maven2/org/apache/commons/commons-pool2/$POOL_VER/commons-pool2-$POOL_VER.jar +``` + +### Start the demo server + +From that directory: + +```bash +javac -cp jedis-6.2.0.jar:slf4j-api-2.0.12.jar:commons-pool2-2.12.1.jar \ + EventStream.java ConsumerWorker.java DemoServer.java + +java -cp .:jedis-6.2.0.jar:slf4j-api-2.0.12.jar:commons-pool2-2.12.1.jar \ + DemoServer --port 8083 --redis-host localhost --redis-port 6379 +``` + +You should see something like: + +```text +Deleting any existing data at key 'demo:events:orders' for a clean demo run (pass --no-reset to keep it). +Redis streaming demo server listening on http://127.0.0.1:8083 +Using Redis at localhost:6379 with stream key 'demo:events:orders' (MAXLEN ~ 2000) +Seeded 3 consumer(s) across 2 group(s) +``` + +By default the demo wipes the configured stream key on startup so each run starts from a clean state. Pass `--no-reset` to keep any existing data at the key (useful when re-running against the same stream to inspect prior state), or `--stream-key ` to point the demo at a different key entirely. + +Open [http://127.0.0.1:8083](http://127.0.0.1:8083) in a browser. You can: + +* **Produce** any number of events of a chosen type (or random types). Watch the stream length grow and the tail update. +* See each **consumer group**: its `last-delivered-id`, the size of its pending list, and the consumers in it. Each consumer shows its processed count, pending count, and idle time. +* **Add or remove** consumers within a group at runtime to see Redis split the work across the new shape. +* Click **Crash next 3** on a consumer to drop its next three deliveries — the same effect as a worker process dying after `XREADGROUP` but before `XACK`. Watch the **Pending entries (XPENDING)** panel fill up. +* Wait until the idle time exceeds the threshold (default 5000 ms), pick a healthy target consumer, and click **XAUTOCLAIM to selected** — the stuck entries are reassigned and the delivery counter increments. +* **Replay (XRANGE)** any range to confirm the full history is independent of consumer-group state. +* **XTRIM** with an approximate `MAXLEN` to bound retention. Note that an approximate trim only releases whole macro-nodes — `MAXLEN ~ 50` on a small stream may not delete anything; on a 300-entry stream it typically lands at around 100. +* Click **Reset demo** to drop the stream and re-seed the default groups. + +The demo server uses standard JDK libraries for HTTP handling and concurrency: + +* [`com.sun.net.httpserver.HttpServer`](https://docs.oracle.com/en/java/javase/21/docs/api/jdk.httpserver/com/sun/net/httpserver/HttpServer.html) for the web server, with a 16-thread fixed pool from `Executors.newFixedThreadPool(16)` +* [`java.util.concurrent.locks.ReentrantLock`](https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/concurrent/locks/ReentrantLock.html) for the in-process consumer-worker registry so add/remove handlers cannot race +* Daemon threads (one per `ConsumerWorker`) for the `XREADGROUP` → process → `XACK` loop + +## Production usage + +This guide uses a deliberately small local demo so you can focus on the streaming pattern. In production, you will usually want to harden several aspects of it. + +### Pick retention by length or by minimum ID + +The demo uses `MAXLEN ~` on every `XADD`. Two alternatives are worth considering: + +* `MINID ~ ` — keep only entries newer than an ID. If you want "the last 24 hours", compute the wall-clock cutoff and pass `XTRIM MINID ~ -0`. This is the right pattern when retention is time-bounded. `EventStream.trimMinid` calls `XTRIM ... MINID ~`. +* No cap on `XADD` plus a periodic `XTRIM` job — useful if your producer is hot and the per-`XADD` work has to stay minimal, or if retention rules are complex (a separate process can also factor in consumer-group lag). + +In all three cases the trimming is approximate by default. Use exact trimming (`MAXLEN n` or `MINID id` without `~`) only when you genuinely need an exact count. + +### Don't let consumer-group lag silently grow + +`XINFO GROUPS` reports each group's `lag` (entries the group has not yet read) and `pending` (entries delivered but not acked). In production, alert on either of these crossing a threshold — a steadily growing pending count usually means consumers are crashing without `XAUTOCLAIM` running, and a growing lag means consumers can't keep up with producers. + +The same applies inside a group: `XINFO CONSUMERS` reports per-consumer pending counts and idle times, so you can spot one slow consumer holding entries that the rest of the group is waiting on. + +### Make consumer logic idempotent + +`XAUTOCLAIM` can re-deliver an entry to a different consumer after a crash. If your processing has side effects (sending email, charging a card, updating a downstream store), make sure the same entry processed twice gives the same result — use an idempotency key, an upsert with conditional check, or a once-per-id guard table. Redis Streams cannot give you exactly-once semantics on its own. + +### Bound the delivery counter as a poison-pill signal + +`XPENDING` returns each entry's delivery count, incremented on every claim. If an entry has been delivered (and dropped) several times, the next consumer is unlikely to fare better. After some threshold — `deliveries >= 5`, say — route the entry to a *dead-letter stream*, ack it on the original group, and alert. New entries keep flowing past a poison pill (`XREADGROUP >` still delivers fresh work), but the bad entry's repeated reclaim wastes consumer time and keeps the PEL bigger than it needs to be — without a DLQ threshold it can also slowly trip retention/lag alerts. + +### Partition by tenant or entity for scale + +A single Redis Stream is a single key, and on a Redis Cluster a single key lives on a single shard. If your throughput exceeds what one shard can handle, partition the stream — for example by tenant ID (`events:orders:{tenant_a}`, `events:orders:{tenant_b}`) — so different tenants land on different shards. Hash-tags (`{tenant_a}`) keep all related streams on the same shard if you need to multi-stream atomically. + +Per-entity partitioning (`events:order:{order_id}`) is the canonical pattern when you treat each entity's stream as the event-sourcing log for that entity: every state change for one order goes on its own stream, which is also bounded in size by the entity's lifetime. + +### Tune the JedisPool + +The demo bumps `JedisPoolConfig.setMaxTotal(64)` and `setMaxIdle(32)` because every blocking `XREADGROUP` call holds *its* Jedis instance for the full block duration (here 500 ms by default). If the pool were sized at the default 8, three consumers blocking at once would leave only five connections for the 16 HTTP handlers, which can starve `/state` polling under load. In production size the pool to at least *(blocking consumers) + (concurrent request threads) + headroom* and keep that ceiling below the server-side `maxclients`. + +Every helper acquires its Jedis from the pool with try-with-resources, so connections always go back to the pool on success or exception. No in-process lock is needed across helpers because each call uses its own connection. + +### Don't read with XREAD (no group) and then try to ack + +`XREAD` and `XREADGROUP` are different mechanisms. `XREAD` is a tail-the-log read with no consumer-group state — entries are not added to any PEL, and you cannot `XACK` them. If you want at-least-once delivery and crash recovery, you must read through a consumer group. + +`XREAD` is still useful for read-only tail clients (a UI streaming events, a debugger, a `tail -f`-style command-line tool). It's just not part of the at-least-once path. + +### Use a separate consumer pool per group + +The demo runs every consumer in one process. In production each consumer group is usually its own deployment — its own pool of pods or VMs — so a slow projection in `analytics` cannot pull `notifications` workers off their stream. Each pod runs one consumer thread per CPU core, with `XAUTOCLAIM` either embedded in the consumer loop (every N reads, claim idle entries to self) or run by a separate reaper. + +### Inspect the stream directly with redis-cli + +When testing or troubleshooting, inspect the stream directly to confirm the consumer state is what you expect: + +```bash +# Stream summary +redis-cli XLEN demo:events:orders +redis-cli XINFO STREAM demo:events:orders + +# Group cursors and pending counts +redis-cli XINFO GROUPS demo:events:orders + +# Consumers within a group +redis-cli XINFO CONSUMERS demo:events:orders notifications + +# Pending entries with idle time and delivery count +redis-cli XPENDING demo:events:orders notifications - + 20 + +# Tail the stream live (no consumer-group state — like tail -f) +redis-cli XREAD BLOCK 0 STREAMS demo:events:orders '$' + +# Replay a range +redis-cli XRANGE demo:events:orders - + COUNT 50 +``` + +If a group's `lag` is growing while consumers' `idle` times are short, consumers are healthy but producers are outpacing them — add more consumers. If `pending` is growing while `lag` is small, consumers are *receiving* entries but not *acking* them — either they are crashing mid-message or your acking logic has a bug. + +## Learn more + +This example uses the following Redis commands: + +* [`XADD`]({{< relref "/commands/xadd" >}}) to append an event with an approximate `MAXLEN` cap. +* [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}) to read new entries for a consumer in a group. +* [`XACK`]({{< relref "/commands/xack" >}}) to acknowledge a processed entry. +* [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}) to reassign idle pending entries to a healthy consumer. +* [`XRANGE`]({{< relref "/commands/xrange" >}}) for replay and audit, independent of consumer-group state. +* [`XPENDING`]({{< relref "/commands/xpending" >}}) to inspect the per-group pending list with idle times and delivery counts. +* [`XTRIM`]({{< relref "/commands/xtrim" >}}) for explicit retention enforcement. +* [`XGROUP CREATE`]({{< relref "/commands/xgroup-create" >}}) and + [`XGROUP DELCONSUMER`]({{< relref "/commands/xgroup-delconsumer" >}}) to manage groups and consumers. +* [`XINFO STREAM`]({{< relref "/commands/xinfo-stream" >}}), + [`XINFO GROUPS`]({{< relref "/commands/xinfo-groups" >}}), and + [`XINFO CONSUMERS`]({{< relref "/commands/xinfo-consumers" >}}) for observability. + +See the [Jedis guide]({{< relref "/develop/clients/jedis" >}}) for the full client reference, and the [Streams overview]({{< relref "/develop/data-types/streams" >}}) for the deeper conceptual model — consumer groups, the PEL, claim semantics, capped streams, and the differences with Kafka partitions. diff --git a/content/develop/use-cases/streaming/java-lettuce/ConsumerWorker.java b/content/develop/use-cases/streaming/java-lettuce/ConsumerWorker.java new file mode 100644 index 0000000000..aa4421dac4 --- /dev/null +++ b/content/develop/use-cases/streaming/java-lettuce/ConsumerWorker.java @@ -0,0 +1,327 @@ +import java.util.ArrayDeque; +import java.util.ArrayList; +import java.util.Collections; +import java.util.Deque; +import java.util.LinkedHashMap; +import java.util.List; +import java.util.Map; + +/** + * Background consumer thread for a single consumer in a consumer group. + * + *

Each worker owns a daemon thread that loops on + * {@code XREADGROUP >} with a short block timeout and acks every entry + * it processes. Recovery of stuck PEL entries (this consumer's, or + * anyone else's) happens through {@link #reapIdlePel()}, which is the + * textbook Streams pattern: each consumer periodically (or on demand) + * calls {@code XAUTOCLAIM} with itself as the target, then processes + * whatever it claimed. The demo's "XAUTOCLAIM to selected" button is + * exactly that call.

+ * + *

Two demo-only levers are wired into the loop:

+ *
    + *
  • {@link #pause()} parks the worker so its pending entries can + * age into the {@code XAUTOCLAIM} window without being consumed by + * {@code >} reads.
  • + *
  • {@link #crashNext(int)} tells the worker to drop its next + * {@code n} deliveries on the floor without acking them — the same + * effect as a worker process dying mid-message. Those entries stay + * in the group's PEL until {@link #reapIdlePel()} recovers them.
  • + *
+ * + *

Real consumers do not need either lever; they only need + * {@code XREADGROUP} → process → {@code XACK} in {@code run} plus a + * periodic {@code reapIdlePel} call.

+ */ +public class ConsumerWorker { + + /** A summary of the worker's status for the demo UI / JSON. */ + public static final class ConsumerStatus { + public final String name; + public final String group; + public final long processed; + public final long reaped; + public final long crashedDrops; + public final boolean paused; + public final int crashQueued; + public final boolean alive; + + public ConsumerStatus(String name, String group, long processed, long reaped, + long crashedDrops, boolean paused, int crashQueued, boolean alive) { + this.name = name; + this.group = group; + this.processed = processed; + this.reaped = reaped; + this.crashedDrops = crashedDrops; + this.paused = paused; + this.crashQueued = crashQueued; + this.alive = alive; + } + } + + /** A row in the worker's bounded "recent activity" deque. */ + public static final class RecentEntry { + public final String id; + public final String type; + public final Map fields; + public final boolean acked; + public final String note; + + public RecentEntry(String id, String type, Map fields, boolean acked, String note) { + this.id = id; + this.type = type; + this.fields = fields; + this.acked = acked; + this.note = note; + } + } + + /** Outcome of one reap pass. */ + public static final class ReapResult { + public final int claimed; + public final int processed; + public final List deletedIds; + + public ReapResult(int claimed, int processed, List deletedIds) { + this.claimed = claimed; + this.processed = processed; + this.deletedIds = deletedIds; + } + } + + private final EventStream stream; + private final String group; + private final String name; + private final long processLatencyMs; + private final int recentCapacity; + + private final Object lock = new Object(); + private final Deque recent = new ArrayDeque<>(); + private long processed = 0L; + private long reaped = 0L; + private long crashedDrops = 0L; + private int crashNextCount = 0; + private volatile boolean paused = false; + + private volatile boolean stopRequested = false; + private Thread thread; + + public ConsumerWorker(EventStream stream, String group, String name) { + this(stream, group, name, 25L, 20); + } + + public ConsumerWorker(EventStream stream, String group, String name, + long processLatencyMs, int recentCapacity) { + if (stream == null || group == null || name == null) { + throw new IllegalArgumentException("stream, group, and name are required"); + } + this.stream = stream; + this.group = group; + this.name = name; + this.processLatencyMs = Math.max(0L, processLatencyMs); + this.recentCapacity = Math.max(1, recentCapacity); + } + + public String getGroup() { return group; } + public String getName() { return name; } + + // ------------------------------------------------------------------ + // Lifecycle + // ------------------------------------------------------------------ + + public synchronized void start() { + if (thread != null && thread.isAlive()) return; + stopRequested = false; + thread = new Thread(this::run, "consumer-" + group + "-" + name); + thread.setDaemon(true); + thread.start(); + } + + public synchronized void stop() { stop(1000L); } + + public synchronized void stop(long joinTimeoutMs) { + stopRequested = true; + if (thread != null) { + try { + thread.join(joinTimeoutMs); + } catch (InterruptedException e) { + Thread.currentThread().interrupt(); + } + if (!thread.isAlive()) thread = null; + } + } + + // ------------------------------------------------------------------ + // Demo levers + // ------------------------------------------------------------------ + + public void pause() { paused = true; } + + public void resume() { paused = false; } + + /** + * Drop the next {@code count} deliveries without acking them. + * + *

The entries stay in the group's PEL with their delivery + * counter incremented, so {@code XAUTOCLAIM} can recover them + * once they exceed the idle threshold.

+ */ + public void crashNext(int count) { + if (count <= 0) return; + synchronized (lock) { + crashNextCount += count; + } + } + + // ------------------------------------------------------------------ + // Introspection + // ------------------------------------------------------------------ + + public List recent() { + synchronized (lock) { + return new ArrayList<>(recent); + } + } + + public ConsumerStatus status() { + synchronized (lock) { + boolean alive; + synchronized (this) { + alive = thread != null && thread.isAlive(); + } + return new ConsumerStatus( + name, group, processed, reaped, crashedDrops, + paused, crashNextCount, alive); + } + } + + // ------------------------------------------------------------------ + // Recovery + // ------------------------------------------------------------------ + + /** + * Run {@code XAUTOCLAIM} into self and process the claimed entries. + * + *

Safe to call from any thread — the heavy lifting is the Redis + * round trip plus a sequential per-entry dispatch. {@code claimed} + * is the count returned by Redis, {@code processed} is what this + * consumer actually handled (may differ if a handler throws), and + * {@code deletedIds} are PEL entries whose stream payload was + * already trimmed by {@code MAXLEN ~} or {@code XTRIM} before the + * sweep ran. Redis 7+ removes those slots from the PEL inside + * {@code XAUTOCLAIM} itself, so the caller does not have to + * {@code XACK} them; they are reported so the caller can route + * them to a dead-letter store.

+ */ + public ReapResult reapIdlePel() { + EventStream.AutoClaimResult result = stream.autoclaim( + group, name, 100L, "0-0", 10); + int processedThisCall = 0; + for (EventStream.Entry entry : result.claimed) { + try { + handleEntry(entry.id, entry.fields); + processedThisCall += 1; + } catch (Exception exc) { + System.err.printf("[%s/%s] reap failed on %s: %s%n", + group, name, entry.id, exc.getMessage()); + } + } + if (processedThisCall > 0) { + synchronized (lock) { + reaped += processedThisCall; + } + } + return new ReapResult(result.claimed.size(), processedThisCall, result.deletedIds); + } + + // ------------------------------------------------------------------ + // Main loop + // ------------------------------------------------------------------ + + private void run() { + while (!stopRequested) { + if (paused) { + sleepQuietly(50L); + continue; + } + List entries; + try { + entries = stream.consume(group, name, 10L, 500L); + } catch (Exception exc) { + // Don't kill the thread on a transient Redis error; a + // real consumer would log this and back off. + System.err.printf("[%s/%s] read failed: %s%n", + group, name, exc.getMessage()); + sleepQuietly(500L); + continue; + } + if (entries == null || entries.isEmpty()) continue; + for (EventStream.Entry entry : entries) { + dispatch(entry.id, entry.fields); + } + } + } + + private void dispatch(String entryId, Map fields) { + if (processLatencyMs > 0) sleepQuietly(processLatencyMs); + try { + handleEntry(entryId, fields); + } catch (Exception exc) { + // A failure here (typically XACK against Redis) must not + // kill the daemon thread — that would silently halt this + // consumer while every other entry sat in its PEL waiting + // for XAUTOCLAIM. The entry stays unacked; the next + // reapIdlePel call (here or on any consumer in the group) + // can recover it once it exceeds the idle threshold. + System.err.printf("[%s/%s] failed to handle %s: %s%n", + group, name, entryId, exc.getMessage()); + String type = fields == null ? "" : fields.getOrDefault("type", ""); + pushRecent(new RecentEntry(entryId, type, copyFields(fields), + false, "handler error: " + exc.getMessage())); + } + } + + private void handleEntry(String entryId, Map fields) { + boolean drop = false; + synchronized (lock) { + if (crashNextCount > 0) { + crashNextCount -= 1; + drop = true; + } + } + String type = fields == null ? "" : fields.getOrDefault("type", ""); + if (drop) { + synchronized (lock) { + crashedDrops += 1; + } + pushRecent(new RecentEntry(entryId, type, copyFields(fields), + false, "dropped (simulated crash)")); + return; + } + stream.ack(group, Collections.singletonList(entryId)); + synchronized (lock) { + processed += 1; + } + pushRecent(new RecentEntry(entryId, type, copyFields(fields), true, "")); + } + + private void pushRecent(RecentEntry entry) { + synchronized (lock) { + recent.addFirst(entry); + while (recent.size() > recentCapacity) recent.removeLast(); + } + } + + private static Map copyFields(Map fields) { + if (fields == null) return new LinkedHashMap<>(); + return new LinkedHashMap<>(fields); + } + + private static void sleepQuietly(long ms) { + try { + Thread.sleep(ms); + } catch (InterruptedException e) { + Thread.currentThread().interrupt(); + } + } +} diff --git a/content/develop/use-cases/streaming/java-lettuce/DemoServer.java b/content/develop/use-cases/streaming/java-lettuce/DemoServer.java new file mode 100644 index 0000000000..5edc7b3c5b --- /dev/null +++ b/content/develop/use-cases/streaming/java-lettuce/DemoServer.java @@ -0,0 +1,1165 @@ +import com.sun.net.httpserver.HttpExchange; +import com.sun.net.httpserver.HttpHandler; +import com.sun.net.httpserver.HttpServer; +import io.lettuce.core.RedisClient; +import io.lettuce.core.RedisURI; +import io.lettuce.core.api.StatefulRedisConnection; + +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.net.InetSocketAddress; +import java.net.URLDecoder; +import java.nio.charset.StandardCharsets; +import java.util.AbstractMap; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.LinkedHashMap; +import java.util.List; +import java.util.Locale; +import java.util.Map; +import java.util.Random; +import java.util.concurrent.ConcurrentHashMap; +import java.util.concurrent.Executors; +import java.util.concurrent.locks.ReentrantLock; + +/** + * Redis streaming demo server using Lettuce. + * + *

Run this file and visit http://localhost:8784 to watch a Redis + * Stream in action: producers append events to a single stream, two + * independent consumer groups read the same stream at their own pace, + * and within the {@code notifications} group two consumers share the + * work.

+ */ +public class DemoServer { + + private static final String[] EVENT_TYPES = { + "order.placed", "order.paid", "order.shipped", "order.cancelled" + }; + private static final String[] CUSTOMERS = {"alice", "bob", "carol", "dan", "erin"}; + + private static final Map> DEFAULT_GROUPS = new LinkedHashMap<>(); + static { + List notifications = new ArrayList<>(); + notifications.add("worker-a"); + notifications.add("worker-b"); + DEFAULT_GROUPS.put("notifications", notifications); + List analytics = new ArrayList<>(); + analytics.add("worker-c"); + DEFAULT_GROUPS.put("analytics", analytics); + } + + private static EventStream stream; + private static StreamingDemo demo; + private static RedisClient redisClient; + private static StatefulRedisConnection connection; + + public static void main(String[] args) { + String host = "127.0.0.1"; + int port = 8784; + String redisHost = "localhost"; + int redisPort = 6379; + String streamKey = "demo:events:orders"; + long maxlen = 2000L; + long claimIdleMs = 5000L; + boolean resetOnStart = true; + + for (int i = 0; i < args.length; i++) { + switch (args[i]) { + case "--host": host = args[++i]; break; + case "--port": port = Integer.parseInt(args[++i]); break; + case "--redis-host": redisHost = args[++i]; break; + case "--redis-port": redisPort = Integer.parseInt(args[++i]); break; + case "--stream-key": streamKey = args[++i]; break; + case "--maxlen": maxlen = Long.parseLong(args[++i]); break; + case "--claim-idle-ms": claimIdleMs = Long.parseLong(args[++i]); break; + case "--no-reset": resetOnStart = false; break; + default: break; + } + } + + try { + redisClient = RedisClient.create( + RedisURI.builder().withHost(redisHost).withPort(redisPort).build()); + connection = redisClient.connect(); + connection.sync().ping(); + } catch (Exception e) { + System.err.printf("Failed to connect to Redis at %s:%d: %s%n", + redisHost, redisPort, e.getMessage()); + System.exit(1); + } + + stream = new EventStream(connection, streamKey, maxlen, claimIdleMs); + demo = new StreamingDemo(stream); + + if (resetOnStart) { + System.out.printf("Deleting any existing data at key '%s' for a clean demo run " + + "(pass --no-reset to keep it).%n", streamKey); + stream.deleteStream(); + } + int seeded = demo.seed(DEFAULT_GROUPS); + + try { + HttpServer server = HttpServer.create(new InetSocketAddress(host, port), 0); + server.createContext("/", new RootHandler()); + server.createContext("/state", new StateHandler()); + server.createContext("/produce", new ProduceHandler()); + server.createContext("/add-worker", new AddWorkerHandler()); + server.createContext("/remove-worker", new RemoveWorkerHandler()); + server.createContext("/crash", new CrashHandler()); + server.createContext("/autoclaim", new AutoclaimHandler()); + server.createContext("/trim", new TrimHandler()); + server.createContext("/replay", new ReplayHandler()); + server.createContext("/reset", new ResetHandler()); + server.setExecutor(Executors.newFixedThreadPool(16)); + + Runtime.getRuntime().addShutdownHook(new Thread(() -> { + try { demo.stopAll(); } catch (Exception ignored) {} + if (connection != null) connection.close(); + if (redisClient != null) redisClient.shutdown(); + })); + + server.start(); + System.out.printf("Redis streaming demo server listening on http://%s:%d%n", + host, port); + System.out.printf("Using Redis at %s:%d with stream key '%s' (MAXLEN ~ %d)%n", + redisHost, redisPort, streamKey, maxlen); + System.out.printf("Seeded %d consumer(s) across %d group(s)%n", + seeded, DEFAULT_GROUPS.size()); + } catch (IOException e) { + System.err.println("Failed to start server: " + e.getMessage()); + System.exit(1); + } + } + + // ----- Demo registry -------------------------------------------------- + + /** In-memory registry of consumer workers across all groups. */ + static class StreamingDemo { + private final EventStream stream; + private final Map workers = new ConcurrentHashMap<>(); + private final ReentrantLock lock = new ReentrantLock(); + + StreamingDemo(EventStream stream) { + this.stream = stream; + } + + int seed(Map> groups) { + lock.lock(); + try { + int count = 0; + for (Map.Entry> entry : groups.entrySet()) { + String group = entry.getKey(); + stream.ensureGroup(group, "0-0"); + for (String name : entry.getValue()) { + addWorker(group, name); + count += 1; + } + } + return count; + } finally { + lock.unlock(); + } + } + + boolean addWorker(String group, String name) { + lock.lock(); + try { + String key = key(group, name); + if (workers.containsKey(key)) return false; + stream.ensureGroup(group, "0-0"); + ConsumerWorker worker = new ConsumerWorker(stream, group, name); + worker.start(); + workers.put(key, worker); + return true; + } finally { + lock.unlock(); + } + } + + Map removeWorker(String group, String name) { + lock.lock(); + try { + String key = key(group, name); + ConsumerWorker worker = workers.get(key); + Map result = new LinkedHashMap<>(); + if (worker == null) { + result.put("removed", false); + result.put("reason", "not-found"); + return result; + } + String peer = null; + for (Map.Entry entry : workers.entrySet()) { + ConsumerWorker candidate = entry.getValue(); + if (candidate == worker) continue; + if (candidate.getGroup().equals(group)) { + peer = candidate.getName(); + break; + } + } + if (peer == null) { + result.put("removed", false); + result.put("reason", "no-peer"); + result.put("message", group + "/" + name + + " still owns pending entries and is the only consumer in its group; " + + "add another consumer first so its PEL can be handed over before deletion."); + return result; + } + int handed = stream.handoverPending(group, name, peer, 100); + workers.remove(key); + worker.stop(); + stream.deleteConsumer(group, name); + result.put("removed", true); + result.put("handed_over_to", peer); + result.put("handed_over_count", handed); + return result; + } finally { + lock.unlock(); + } + } + + ConsumerWorker getWorker(String group, String name) { + return workers.get(key(group, name)); + } + + List snapshot() { + return new ArrayList<>(workers.values()); + } + + void stopAll() { + lock.lock(); + try { + for (ConsumerWorker worker : workers.values()) worker.stop(); + workers.clear(); + } finally { + lock.unlock(); + } + } + + int reset(Map> groups) { + lock.lock(); + try { + stopAll(); + stream.deleteStream(); + stream.resetStats(); + return seed(groups); + } finally { + lock.unlock(); + } + } + + private static String key(String group, String name) { + return group + "" + name; + } + } + + // ----- Handlers ------------------------------------------------------- + + static class RootHandler implements HttpHandler { + @Override + public void handle(HttpExchange exchange) throws IOException { + String path = exchange.getRequestURI().getPath(); + if (!path.equals("/") && !path.equals("/index.html")) { + sendJson(exchange, 404, "{\"error\":\"Not Found\"}"); + return; + } + if (!"GET".equalsIgnoreCase(exchange.getRequestMethod())) { + sendJson(exchange, 405, "{\"error\":\"Method Not Allowed\"}"); + return; + } + byte[] body = renderHtmlPage().getBytes(StandardCharsets.UTF_8); + exchange.getResponseHeaders().set("Content-Type", "text/html; charset=utf-8"); + exchange.sendResponseHeaders(200, body.length); + try (OutputStream os = exchange.getResponseBody()) { os.write(body); } + } + } + + static class StateHandler implements HttpHandler { + @Override + public void handle(HttpExchange exchange) throws IOException { + sendJson(exchange, 200, toJson(buildState())); + } + } + + static class ProduceHandler implements HttpHandler { + @Override + public void handle(HttpExchange exchange) throws IOException { + if (!"POST".equalsIgnoreCase(exchange.getRequestMethod())) { + sendJson(exchange, 405, "{\"error\":\"Method Not Allowed\"}"); + return; + } + Map form = parseFormData(readRequestBody(exchange)); + int count = clamp(parseInt(form.get("count"), 1), 1, 500); + String eventType = form.getOrDefault("type", "").trim(); + Random rnd = new Random(); + List>> events = new ArrayList<>(count); + for (int i = 0; i < count; i++) { + String picked = eventType.isEmpty() + ? EVENT_TYPES[rnd.nextInt(EVENT_TYPES.length)] + : eventType; + events.add(new AbstractMap.SimpleEntry<>(picked, fakePayload(rnd))); + } + List ids = stream.produceBatch(events); + Map response = new LinkedHashMap<>(); + response.put("produced", ids.size()); + response.put("ids", ids); + sendJson(exchange, 200, toJson(response)); + } + } + + static class AddWorkerHandler implements HttpHandler { + @Override + public void handle(HttpExchange exchange) throws IOException { + if (!"POST".equalsIgnoreCase(exchange.getRequestMethod())) { + sendJson(exchange, 405, "{\"error\":\"Method Not Allowed\"}"); + return; + } + Map form = parseFormData(readRequestBody(exchange)); + String group = form.getOrDefault("group", "").trim(); + String name = form.getOrDefault("name", "").trim(); + if (group.isEmpty() || name.isEmpty()) { + sendJson(exchange, 400, "{\"error\":\"group and name are required\"}"); + return; + } + boolean added = demo.addWorker(group, name); + if (!added) { + sendJson(exchange, 409, + "{\"error\":\"" + jsonEscape(group + "/" + name + " already exists") + "\"}"); + return; + } + Map response = new LinkedHashMap<>(); + response.put("group", group); + response.put("name", name); + sendJson(exchange, 200, toJson(response)); + } + } + + static class RemoveWorkerHandler implements HttpHandler { + @Override + public void handle(HttpExchange exchange) throws IOException { + if (!"POST".equalsIgnoreCase(exchange.getRequestMethod())) { + sendJson(exchange, 405, "{\"error\":\"Method Not Allowed\"}"); + return; + } + Map form = parseFormData(readRequestBody(exchange)); + String group = form.getOrDefault("group", "").trim(); + String name = form.getOrDefault("name", "").trim(); + Map result = demo.removeWorker(group, name); + int status = (Boolean.TRUE.equals(result.get("removed")) + || "not-found".equals(result.get("reason"))) ? 200 : 409; + sendJson(exchange, status, toJson(result)); + } + } + + static class CrashHandler implements HttpHandler { + @Override + public void handle(HttpExchange exchange) throws IOException { + if (!"POST".equalsIgnoreCase(exchange.getRequestMethod())) { + sendJson(exchange, 405, "{\"error\":\"Method Not Allowed\"}"); + return; + } + Map form = parseFormData(readRequestBody(exchange)); + String group = form.getOrDefault("group", "").trim(); + String name = form.getOrDefault("name", "").trim(); + int count = parseInt(form.get("count"), 1); + ConsumerWorker worker = demo.getWorker(group, name); + if (worker == null) { + sendJson(exchange, 404, + "{\"error\":\"" + jsonEscape("unknown consumer " + group + "/" + name) + "\"}"); + return; + } + worker.crashNext(count); + sendJson(exchange, 200, "{\"queued\":" + count + "}"); + } + } + + static class AutoclaimHandler implements HttpHandler { + @Override + public void handle(HttpExchange exchange) throws IOException { + if (!"POST".equalsIgnoreCase(exchange.getRequestMethod())) { + sendJson(exchange, 405, "{\"error\":\"Method Not Allowed\"}"); + return; + } + Map form = parseFormData(readRequestBody(exchange)); + String group = form.getOrDefault("group", "").trim(); + String consumer = form.getOrDefault("consumer", "").trim(); + if (group.isEmpty() || consumer.isEmpty()) { + sendJson(exchange, 400, "{\"error\":\"group and consumer are required\"}"); + return; + } + ConsumerWorker worker = demo.getWorker(group, consumer); + if (worker == null) { + sendJson(exchange, 404, + "{\"error\":\"" + jsonEscape("unknown consumer " + group + "/" + consumer) + "\"}"); + return; + } + ConsumerWorker.ReapResult result = worker.reapIdlePel(); + Map response = new LinkedHashMap<>(); + response.put("claimed", result.claimed); + response.put("processed", result.processed); + response.put("deleted", result.deletedIds); + response.put("min_idle_ms", stream.getClaimMinIdleMs()); + sendJson(exchange, 200, toJson(response)); + } + } + + static class TrimHandler implements HttpHandler { + @Override + public void handle(HttpExchange exchange) throws IOException { + if (!"POST".equalsIgnoreCase(exchange.getRequestMethod())) { + sendJson(exchange, 405, "{\"error\":\"Method Not Allowed\"}"); + return; + } + Map form = parseFormData(readRequestBody(exchange)); + long maxlen = parseLong(form.get("maxlen"), 0L); + long deleted = stream.trimMaxlen(maxlen); + Map response = new LinkedHashMap<>(); + response.put("deleted", deleted); + response.put("maxlen", maxlen); + sendJson(exchange, 200, toJson(response)); + } + } + + static class ReplayHandler implements HttpHandler { + @Override + public void handle(HttpExchange exchange) throws IOException { + Map query = parseQuery(exchange.getRequestURI().getRawQuery()); + String start = nonEmpty(query.get("start"), "-"); + String end = nonEmpty(query.get("end"), "+"); + long count = clamp(parseLong(query.get("count"), 20L), 1L, 500L); + List entries = stream.replay(start, end, count); + List> entryList = new ArrayList<>(entries.size()); + for (EventStream.Entry entry : entries) { + Map row = new LinkedHashMap<>(); + row.put("id", entry.id); + row.put("fields", entry.fields); + entryList.add(row); + } + Map response = new LinkedHashMap<>(); + response.put("start", start); + response.put("end", end); + response.put("limit", count); + response.put("entries", entryList); + sendJson(exchange, 200, toJson(response)); + } + } + + static class ResetHandler implements HttpHandler { + @Override + public void handle(HttpExchange exchange) throws IOException { + if (!"POST".equalsIgnoreCase(exchange.getRequestMethod())) { + sendJson(exchange, 405, "{\"error\":\"Method Not Allowed\"}"); + return; + } + int count = demo.reset(DEFAULT_GROUPS); + sendJson(exchange, 200, "{\"consumers\":" + count + "}"); + } + } + + // ----- State assembly ------------------------------------------------- + + private static Map buildState() { + Map state = new LinkedHashMap<>(); + state.put("stream", stream.infoStream()); + + List tailEntries = stream.tail(10); + List> tail = new ArrayList<>(tailEntries.size()); + for (EventStream.Entry entry : tailEntries) { + Map row = new LinkedHashMap<>(); + row.put("id", entry.id); + row.put("fields", entry.fields); + tail.add(row); + } + state.put("tail", tail); + + List snapshot = demo.snapshot(); + List> groupsDetail = new ArrayList<>(); + List> pendingRows = new ArrayList<>(); + + for (Map group : stream.infoGroups()) { + String groupName = String.valueOf(group.get("name")); + Map> consumerInfo = new LinkedHashMap<>(); + for (Map c : stream.infoConsumers(groupName)) { + consumerInfo.put(String.valueOf(c.get("name")), c); + } + + List> consumersDetail = new ArrayList<>(); + List covered = new ArrayList<>(); + for (ConsumerWorker worker : snapshot) { + if (!worker.getGroup().equals(groupName)) continue; + Map info = consumerInfo.get(worker.getName()); + ConsumerWorker.ConsumerStatus status = worker.status(); + Map row = new LinkedHashMap<>(); + row.put("name", status.name); + row.put("group", status.group); + row.put("processed", status.processed); + row.put("reaped", status.reaped); + row.put("crashed_drops", status.crashedDrops); + row.put("paused", status.paused); + row.put("crash_queued", status.crashQueued); + row.put("alive", status.alive); + row.put("pending", info == null ? 0L : info.get("pending")); + row.put("idle_ms", info == null ? 0L : info.get("idle_ms")); + List recent = worker.recent(); + List> recentRows = new ArrayList<>(recent.size()); + for (ConsumerWorker.RecentEntry e : recent) { + Map r = new LinkedHashMap<>(); + r.put("id", e.id); + r.put("type", e.type); + r.put("fields", e.fields); + r.put("acked", e.acked); + r.put("note", e.note); + recentRows.add(r); + } + row.put("recent", recentRows); + consumersDetail.add(row); + covered.add(worker.getName()); + } + // Include orphaned consumers that exist in Redis but not + // in our in-process registry (e.g. left over from a prior + // run when --no-reset is set). + for (Map.Entry> entry : consumerInfo.entrySet()) { + if (covered.contains(entry.getKey())) continue; + Map info = entry.getValue(); + Map row = new LinkedHashMap<>(); + row.put("name", entry.getKey()); + row.put("group", groupName); + row.put("processed", 0L); + row.put("reaped", 0L); + row.put("crashed_drops", 0L); + row.put("paused", false); + row.put("crash_queued", 0); + row.put("alive", false); + row.put("pending", info.get("pending")); + row.put("idle_ms", info.get("idle_ms")); + row.put("recent", new ArrayList<>()); + consumersDetail.add(row); + } + consumersDetail.sort((a, b) -> + String.valueOf(a.get("name")).compareTo(String.valueOf(b.get("name")))); + + Map groupRow = new LinkedHashMap<>(group); + groupRow.put("consumers_detail", consumersDetail); + groupsDetail.add(groupRow); + + for (EventStream.PendingEntry p : stream.pendingDetail(groupName, 50)) { + Map row = new LinkedHashMap<>(); + row.put("id", p.id); + row.put("consumer", p.consumer); + row.put("idle_ms", p.idleMs); + row.put("deliveries", p.deliveries); + row.put("group", groupName); + pendingRows.add(row); + } + } + + state.put("groups", groupsDetail); + state.put("pending", pendingRows); + state.put("stats", stream.stats()); + return state; + } + + // ----- HTTP plumbing -------------------------------------------------- + + private static String readRequestBody(HttpExchange exchange) throws IOException { + try (InputStream is = exchange.getRequestBody()) { + return new String(is.readAllBytes(), StandardCharsets.UTF_8); + } + } + + private static Map parseFormData(String body) { + Map params = new HashMap<>(); + if (body == null || body.isEmpty()) return params; + for (String pair : body.split("&")) { + String[] kv = pair.split("=", 2); + if (kv.length != 2 || kv[0].isEmpty()) continue; + params.put(URLDecoder.decode(kv[0], StandardCharsets.UTF_8), + URLDecoder.decode(kv[1], StandardCharsets.UTF_8)); + } + return params; + } + + private static Map parseQuery(String query) { + if (query == null || query.isEmpty()) return new HashMap<>(); + return parseFormData(query); + } + + private static void sendJson(HttpExchange exchange, int status, String body) throws IOException { + byte[] bytes = body.getBytes(StandardCharsets.UTF_8); + exchange.getResponseHeaders().set("Content-Type", "application/json"); + exchange.sendResponseHeaders(status, bytes.length); + try (OutputStream os = exchange.getResponseBody()) { os.write(bytes); } + } + + private static String toJson(Object value) { + StringBuilder sb = new StringBuilder(); + appendJson(sb, value); + return sb.toString(); + } + + @SuppressWarnings("unchecked") + private static void appendJson(StringBuilder sb, Object value) { + if (value == null) sb.append("null"); + else if (value instanceof Boolean) sb.append(value); + else if (value instanceof Number) sb.append(value); + else if (value instanceof Map) { + sb.append('{'); + boolean first = true; + for (Map.Entry entry : ((Map) value).entrySet()) { + if (!first) sb.append(','); + first = false; + appendJsonString(sb, String.valueOf(entry.getKey())); + sb.append(':'); + appendJson(sb, entry.getValue()); + } + sb.append('}'); + } else if (value instanceof List) { + sb.append('['); + boolean first = true; + for (Object item : (List) value) { + if (!first) sb.append(','); + first = false; + appendJson(sb, item); + } + sb.append(']'); + } else { + appendJsonString(sb, String.valueOf(value)); + } + } + + private static void appendJsonString(StringBuilder sb, String value) { + sb.append('"').append(jsonEscape(value)).append('"'); + } + + private static String jsonEscape(String value) { + StringBuilder sb = new StringBuilder(value.length() + 4); + for (int i = 0; i < value.length(); i++) { + char c = value.charAt(i); + switch (c) { + case '"': sb.append("\\\""); break; + case '\\': sb.append("\\\\"); break; + case '\n': sb.append("\\n"); break; + case '\r': sb.append("\\r"); break; + case '\t': sb.append("\\t"); break; + default: + if (c < 0x20) sb.append(String.format("\\u%04x", (int) c)); + else sb.append(c); + } + } + return sb.toString(); + } + + // ----- Misc helpers --------------------------------------------------- + + private static int clamp(int value, int min, int max) { + return Math.max(min, Math.min(max, value)); + } + + private static long clamp(long value, long min, long max) { + return Math.max(min, Math.min(max, value)); + } + + private static int parseInt(String value, int fallback) { + if (value == null || value.isEmpty()) return fallback; + try { return Integer.parseInt(value); } catch (NumberFormatException e) { return fallback; } + } + + private static long parseLong(String value, long fallback) { + if (value == null || value.isEmpty()) return fallback; + try { return Long.parseLong(value); } catch (NumberFormatException e) { return fallback; } + } + + private static String nonEmpty(String value, String fallback) { + return (value == null || value.isEmpty()) ? fallback : value; + } + + private static Map fakePayload(Random rnd) { + Map payload = new LinkedHashMap<>(); + payload.put("order_id", "o-" + (1000 + rnd.nextInt(9000))); + payload.put("customer", CUSTOMERS[rnd.nextInt(CUSTOMERS.length)]); + double amount = 5.0 + rnd.nextDouble() * 245.0; + payload.put("amount", String.format(Locale.ROOT, "%.2f", amount)); + return payload; + } + + private static String renderHtmlPage() { + return HTML_TEMPLATE + .replace("__STREAM_KEY__", stream.getStreamKey()) + .replace("__MAXLEN__", Long.toString(stream.getMaxlenApprox())) + .replace("__CLAIM_IDLE__", Long.toString(stream.getClaimMinIdleMs())); + } + + // The HTML is functionally identical to the redis-py reference; the + // only changes are the pill text and the variable-substitution + // tokens that get rewritten at render time. + private static final String HTML_TEMPLATE = """ + + + + + + Redis Streaming Demo + + + +
+
Lettuce + com.sun.net.httpserver
+

Redis Streaming Demo

+

+ Producers append events to a single Redis Stream + (__STREAM_KEY__). Two consumer groups read the same + stream independently: notifications shares its work + across two consumers, analytics processes the full + flow on its own. Acknowledge with XACK, recover + crashed deliveries with XAUTOCLAIM, replay any range + with XRANGE, and bound retention with XTRIM. +

+ +
+
+

Stream state

+
Loading...
+ + +
+ +
+

Produce events

+

Events are appended with XADD with an approximate + MAXLEN ~ __MAXLEN__ retention cap.

+ + + + + +
+ +
+

Replay range (XRANGE)

+

Reads a slice of history. Replay is independent of any + consumer group — no cursors move, no acks happen.

+ + + + + + + +
+ +
+

Trim retention (XTRIM)

+

Cap the stream length. Approximate trimming releases whole + macro-nodes, which is much cheaper than exact trimming.

+ + + +
+ +
+

Consumer groups

+
Loading...
+
+ +
+

Pending entries (XPENDING)

+

Entries delivered to a consumer that haven't been acked yet. + Idle time ≥ __CLAIM_IDLE__ ms is eligible for + XAUTOCLAIM.

+
Loading...
+
+ + +
+
+ +
+

Last result

+

Produce events, replay a range, or trigger an autoclaim to see results.

+
+
+ +
+
+ + + + + """; +} diff --git a/content/develop/use-cases/streaming/java-lettuce/EventStream.java b/content/develop/use-cases/streaming/java-lettuce/EventStream.java new file mode 100644 index 0000000000..e6d9701a6d --- /dev/null +++ b/content/develop/use-cases/streaming/java-lettuce/EventStream.java @@ -0,0 +1,655 @@ +import io.lettuce.core.Consumer; +import io.lettuce.core.Limit; +import io.lettuce.core.Range; +import io.lettuce.core.RedisBusyException; +import io.lettuce.core.RedisCommandExecutionException; +import io.lettuce.core.RedisFuture; +import io.lettuce.core.StreamMessage; +import io.lettuce.core.XAddArgs; +import io.lettuce.core.XAutoClaimArgs; +import io.lettuce.core.XClaimArgs; +import io.lettuce.core.XGroupCreateArgs; +import io.lettuce.core.XPendingArgs; +import io.lettuce.core.XReadArgs; +import io.lettuce.core.XTrimArgs; +import io.lettuce.core.api.StatefulRedisConnection; +import io.lettuce.core.api.async.RedisAsyncCommands; +import io.lettuce.core.api.sync.RedisCommands; +import io.lettuce.core.codec.StringCodec; +import io.lettuce.core.models.stream.ClaimedMessages; +import io.lettuce.core.models.stream.PendingMessage; +import io.lettuce.core.output.NestedMultiOutput; +import io.lettuce.core.protocol.CommandArgs; +import io.lettuce.core.protocol.CommandType; + +import java.util.ArrayList; +import java.util.Collection; +import java.util.Collections; +import java.util.LinkedHashMap; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.atomic.AtomicLong; + +/** + * Redis event-stream helper backed by a single Redis Stream. + * + *

Producers append events with {@code XADD}. Consumers belong to + * consumer groups and read with {@code XREADGROUP}. The group as a + * whole tracks a single {@code last-delivered-id} cursor, and each + * consumer gets its own pending-entries list (PEL) of in-flight + * messages it has been handed. Once a consumer has processed an entry + * it acknowledges it with {@code XACK}; entries left unacknowledged + * past an idle threshold can be swept to a healthy consumer with + * {@code XAUTOCLAIM} (or to a specific one with {@code XCLAIM}).

+ * + *

Each {@code XADD} carries an approximate {@code MAXLEN} so the + * stream stays bounded as it rolls forward. {@code XRANGE} supports + * replay over the retained history for debugging, audit, or rebuilding + * a downstream projection.

+ * + *

The same stream can be read by any number of consumer groups — + * each group has its own cursor and its own pending lists, so + * analytics, notifications, and audit can all process the full event + * flow at their own pace without coordinating with each other.

+ * + *

Lettuce-specific notes:

+ *
    + *
  • A single {@link StatefulRedisConnection} is thread-safe for + * individual command calls; this class shares one connection across + * threads. No transactions are issued, so no client-side lock is + * required around commands here.
  • + *
  • Lettuce's {@code xautoclaim} method returns a + * {@link ClaimedMessages} that only exposes the continuation cursor + * and claimed messages — it does not surface the third + * reply element (deleted IDs) that Redis 7+ introduced. To preserve + * the textbook {@code (cursor, claimed, deleted_ids)} shape that the + * reference implementation returns, this helper dispatches the + * {@code XAUTOCLAIM} command itself with a + * {@link NestedMultiOutput} so it can parse the third slot.
  • + *
  • {@link #produceBatch(Iterable)} uses the async API with + * {@code setAutoFlushCommands(false)} so the batch is one round + * trip. The sync API would block on every command's future and + * deadlock with auto-flush off.
  • + *
+ */ +public class EventStream { + + /** A single stream entry: {@code (id, fields)}. */ + public static final class Entry { + public final String id; + public final Map fields; + + public Entry(String id, Map fields) { + this.id = id; + this.fields = fields; + } + } + + /** Result of an {@link #autoclaim(String, String, long, String, int)} sweep. */ + public static final class AutoClaimResult { + public final List claimed; + public final List deletedIds; + public final String nextCursor; + + public AutoClaimResult(List claimed, List deletedIds, String nextCursor) { + this.claimed = claimed; + this.deletedIds = deletedIds; + this.nextCursor = nextCursor; + } + } + + /** Per-entry pending detail. */ + public static final class PendingEntry { + public final String id; + public final String consumer; + public final long idleMs; + public final long deliveries; + + public PendingEntry(String id, String consumer, long idleMs, long deliveries) { + this.id = id; + this.consumer = consumer; + this.idleMs = idleMs; + this.deliveries = deliveries; + } + } + + private final StatefulRedisConnection connection; + private final String streamKey; + private final long maxlenApprox; + private final long claimMinIdleMs; + + private final AtomicLong producedTotal = new AtomicLong(); + private final AtomicLong ackedTotal = new AtomicLong(); + private final AtomicLong claimedTotal = new AtomicLong(); + + public EventStream(StatefulRedisConnection connection) { + this(connection, "demo:events:orders", 10_000L, 15_000L); + } + + public EventStream( + StatefulRedisConnection connection, + String streamKey, + long maxlenApprox, + long claimMinIdleMs) { + if (connection == null) { + throw new IllegalArgumentException("connection is required"); + } + if (streamKey == null || streamKey.isEmpty()) { + throw new IllegalArgumentException("streamKey is required"); + } + this.connection = connection; + this.streamKey = streamKey; + this.maxlenApprox = maxlenApprox; + this.claimMinIdleMs = claimMinIdleMs; + } + + public String getStreamKey() { return streamKey; } + public long getMaxlenApprox() { return maxlenApprox; } + public long getClaimMinIdleMs() { return claimMinIdleMs; } + + // ------------------------------------------------------------------ + // Producer + // ------------------------------------------------------------------ + + /** Append a single event. Returns the stream ID Redis assigned. */ + public String produce(String eventType, Map payload) { + Map fields = encodeFields(eventType, payload); + XAddArgs args = XAddArgs.Builder.maxlen(maxlenApprox).approximateTrimming(); + String id = connection.sync().xadd(streamKey, args, fields); + producedTotal.incrementAndGet(); + return id; + } + + /** + * Pipeline several {@code XADD} calls via the async API. + * + *

Lettuce's async API lets us queue commands and await their + * futures in bulk, but {@link StatefulConnection#setAutoFlushCommands} + * is connection-wide: turning auto-flush off so we can + * batch this method's writes would also stall every other thread + * that is currently issuing sync commands on the same connection + * (the consumer workers' {@code XREADGROUP} loops, in this demo). + * That deadlocks silently — no exception, just a hung consumer + * whose command sits in the buffer until something else triggers + * a flush.

+ * + *

The safe pattern here is to leave auto-flush on and queue the + * async {@code XADD} calls. Lettuce still pipelines them when they + * arrive faster than the round-trip latency, and the futures + * complete independently of any other thread's traffic. For + * truly large batches you would use a dedicated connection so + * {@code setAutoFlushCommands(false)} can't affect other threads.

+ */ + public List produceBatch(Iterable>> events) { + RedisAsyncCommands async = connection.async(); + List> futures = new ArrayList<>(); + for (Map.Entry> event : events) { + if (event == null) continue; + Map fields = encodeFields(event.getKey(), event.getValue()); + XAddArgs args = XAddArgs.Builder.maxlen(maxlenApprox).approximateTrimming(); + futures.add(async.xadd(streamKey, args, fields)); + } + List ids = new ArrayList<>(futures.size()); + for (RedisFuture future : futures) { + try { + ids.add(future.get()); + } catch (InterruptedException e) { + Thread.currentThread().interrupt(); + throw new RuntimeException("produceBatch interrupted", e); + } catch (ExecutionException e) { + throw new RuntimeException("produceBatch failed", e.getCause()); + } + } + if (!ids.isEmpty()) producedTotal.addAndGet(ids.size()); + return ids; + } + + private static Map encodeFields(String eventType, Map payload) { + Map fields = new LinkedHashMap<>(); + fields.put("type", eventType); + fields.put("ts_ms", Long.toString(System.currentTimeMillis())); + if (payload != null) { + for (Map.Entry kv : payload.entrySet()) { + fields.put(kv.getKey(), kv.getValue() == null ? "" : kv.getValue()); + } + } + return fields; + } + + // ------------------------------------------------------------------ + // Consumer groups + // ------------------------------------------------------------------ + + /** + * Create the consumer group if it doesn't exist. Pass {@code "$"} + * for "only events appended after now", or {@code "0-0"} to replay + * everything from the beginning of the stream. + */ + public void ensureGroup(String group, String startId) { + XReadArgs.StreamOffset offset = XReadArgs.StreamOffset.from(streamKey, startId); + try { + connection.sync().xgroupCreate(offset, group, XGroupCreateArgs.Builder.mkstream()); + } catch (RedisBusyException ignored) { + // BUSYGROUP — group already exists, swallow. + } catch (RedisCommandExecutionException exc) { + // Some Lettuce builds throw the generic execution exception + // for BUSYGROUP instead of the typed RedisBusyException; + // sniff the message rather than crash on a benign duplicate. + String msg = exc.getMessage(); + if (msg == null || !msg.contains("BUSYGROUP")) { + throw exc; + } + } + } + + public long deleteGroup(String group) { + Boolean destroyed = connection.sync().xgroupDestroy(streamKey, group); + return Boolean.TRUE.equals(destroyed) ? 1L : 0L; + } + + /** + * Read new entries for {@code consumer} via {@code XREADGROUP >}. + * + *

The {@code >} offset means "deliver entries this group has not + * delivered to anyone yet" — the at-least-once path. Use + * {@link #consumeOwnPel(String, String, long)} for the recovery + * path that re-delivers entries already in this consumer's PEL.

+ */ + public List consume(String group, String consumer, long count, long blockMs) { + XReadArgs.StreamOffset offset = XReadArgs.StreamOffset.lastConsumed(streamKey); + XReadArgs args = XReadArgs.Builder.count(count).block(blockMs); + List> raw = connection.sync() + .xreadgroup(Consumer.from(group, consumer), args, offset); + return toEntries(raw); + } + + /** + * Re-deliver entries already in this consumer's PEL (offset + * {@code "0"}). + * + *

Reading with an explicit ID instead of {@code >} replays the + * entries already assigned to this consumer name without advancing + * the group's {@code last-delivered-id}. Canonical recovery path + * after a crash on the same consumer name, and also how a consumer + * picks up entries that {@code XAUTOCLAIM} or {@code XCLAIM} just + * handed it.

+ */ + public List consumeOwnPel(String group, String consumer, long count) { + XReadArgs.StreamOffset offset = XReadArgs.StreamOffset.from(streamKey, "0"); + XReadArgs args = XReadArgs.Builder.count(count); + List> raw = connection.sync() + .xreadgroup(Consumer.from(group, consumer), args, offset); + return toEntries(raw); + } + + public long ack(String group, Collection ids) { + if (ids == null || ids.isEmpty()) return 0L; + Long n = connection.sync().xack(streamKey, group, ids.toArray(new String[0])); + long count = n == null ? 0L : n; + if (count > 0) ackedTotal.addAndGet(count); + return count; + } + + /** + * Sweep idle pending entries to {@code consumer}, paging through + * the PEL with {@code XAUTOCLAIM}'s continuation cursor. + * + *

For a full sweep, loop until the cursor returns to + * {@code "0-0"} (or {@code maxPages} as a safety net so a very + * large PEL can't monopolise the call).

+ * + *

{@code deletedIds} are PEL entries whose stream payload had + * already been trimmed by the time this sweep ran (typically + * because {@code MAXLEN ~} retention outran a slow consumer). + * {@code XAUTOCLAIM} on Redis 7+ removes those dangling slots from + * the PEL itself — the caller does not need to + * {@code XACK} them — but they cannot be retried, so log and route + * them to a dead-letter store for observability.

+ * + *

Lettuce's high-level {@code xautoclaim} method does not surface + * the third (deleted-IDs) element of the reply, so this method + * dispatches the command directly with a + * {@link NestedMultiOutput} and parses all three slots.

+ */ + public AutoClaimResult autoclaim( + String group, String consumer, long pageCount, String startId, int maxPages) { + List claimedAll = new ArrayList<>(); + List deletedAll = new ArrayList<>(); + String cursor = startId == null ? "0-0" : startId; + String lastCursor = cursor; + for (int i = 0; i < maxPages; i++) { + AutoClaimPage page = autoclaimPage(group, consumer, pageCount, cursor); + claimedAll.addAll(page.claimed); + deletedAll.addAll(page.deletedIds); + lastCursor = page.nextCursor; + if ("0-0".equals(page.nextCursor)) break; + cursor = page.nextCursor; + } + if (!claimedAll.isEmpty()) claimedTotal.addAndGet(claimedAll.size()); + return new AutoClaimResult(claimedAll, deletedAll, lastCursor); + } + + private static final class AutoClaimPage { + final List claimed; + final List deletedIds; + final String nextCursor; + + AutoClaimPage(List claimed, List deletedIds, String nextCursor) { + this.claimed = claimed; + this.deletedIds = deletedIds; + this.nextCursor = nextCursor; + } + } + + /** + * Dispatch {@code XAUTOCLAIM} as a raw command so we can read the + * Redis 7+ third reply slot (deleted IDs). Lettuce's typed helper + * returns {@link ClaimedMessages}, which only exposes the cursor + * and the claimed messages. + */ + @SuppressWarnings("unchecked") + private AutoClaimPage autoclaimPage(String group, String consumer, long count, String startId) { + CommandArgs commandArgs = new CommandArgs<>(StringCodec.UTF8) + .addKey(streamKey) + .add(group) + .add(consumer) + .add(claimMinIdleMs) + .add(startId) + .add("COUNT") + .add(count); + List reply = connection.sync().dispatch( + CommandType.XAUTOCLAIM, + new NestedMultiOutput<>(StringCodec.UTF8), + commandArgs); + String nextCursor = "0-0"; + List claimed = new ArrayList<>(); + List deletedIds = new ArrayList<>(); + if (reply != null) { + if (reply.size() >= 1 && reply.get(0) instanceof String) { + nextCursor = (String) reply.get(0); + } + if (reply.size() >= 2 && reply.get(1) instanceof List) { + for (Object item : (List) reply.get(1)) { + if (!(item instanceof List)) continue; + List entryList = (List) item; + if (entryList.size() < 2) continue; + String id = String.valueOf(entryList.get(0)); + Map fields = new LinkedHashMap<>(); + Object fieldObj = entryList.get(1); + if (fieldObj instanceof List) { + List flat = (List) fieldObj; + for (int k = 0; k + 1 < flat.size(); k += 2) { + fields.put(String.valueOf(flat.get(k)), String.valueOf(flat.get(k + 1))); + } + } + claimed.add(new Entry(id, fields)); + } + } + if (reply.size() >= 3 && reply.get(2) instanceof List) { + for (Object item : (List) reply.get(2)) { + deletedIds.add(String.valueOf(item)); + } + } + } + return new AutoClaimPage(claimed, deletedIds, nextCursor); + } + + /** + * Drop a consumer from a group. + * + *

{@code XGROUP DELCONSUMER} destroys this consumer's PEL + * entries — any entry it still owned is no longer tracked anywhere + * in the group, and {@code XAUTOCLAIM} will never find it again. + * Call {@link #handoverPending(String, String, String, int)} (or + * {@code XCLAIM} manually) to a healthy consumer first; this method + * is the raw destructive call and is exposed only for explicit + * cleanup.

+ */ + public long deleteConsumer(String group, String consumer) { + try { + Long n = connection.sync().xgroupDelconsumer(streamKey, Consumer.from(group, consumer)); + return n == null ? 0L : n; + } catch (RedisCommandExecutionException exc) { + return 0L; + } + } + + /** + * Move every PEL entry owned by {@code fromConsumer} to + * {@code toConsumer}. + * + *

Enumerates the source consumer's PEL with + * {@code XPENDING ... CONSUMER} and reassigns each ID with + * {@code XCLAIM} at zero idle time so the move is unconditional. + * ({@code XAUTOCLAIM} does not filter by source consumer, so it + * cannot be used for a per-consumer handover.)

+ * + *

Call this before {@link #deleteConsumer(String, String)} + * whenever the source still has pending entries — otherwise + * {@code XGROUP DELCONSUMER} would silently destroy them.

+ */ + public int handoverPending(String group, String fromConsumer, String toConsumer, int batch) { + RedisCommands sync = connection.sync(); + Consumer source = Consumer.from(group, fromConsumer); + Consumer target = Consumer.from(group, toConsumer); + int handed = 0; + while (true) { + XPendingArgs args = XPendingArgs.Builder.xpending( + source, Range.unbounded(), Limit.from(batch)); + List rows = sync.xpending(streamKey, args); + if (rows == null || rows.isEmpty()) break; + String[] ids = new String[rows.size()]; + for (int i = 0; i < rows.size(); i++) ids[i] = rows.get(i).getId(); + List> claimed = sync.xclaim( + streamKey, target, XClaimArgs.Builder.minIdleTime(0L), ids); + handed += claimed == null ? 0 : claimed.size(); + if (rows.size() < batch) break; + } + if (handed > 0) claimedTotal.addAndGet(handed); + return handed; + } + + // ------------------------------------------------------------------ + // Replay, length, trim + // ------------------------------------------------------------------ + + /** + * Range read with {@code XRANGE} for replay or audit. Read-only: + * ranges do not update any group cursor and do not ack anything. + */ + public List replay(String startId, String endId, long count) { + Range range = Range.create( + startId == null ? "-" : startId, + endId == null ? "+" : endId); + List> raw = connection.sync() + .xrange(streamKey, range, Limit.from(count)); + return toEntries(raw); + } + + /** Most-recent entries first, via {@code XREVRANGE}. */ + public List tail(long count) { + List> raw = connection.sync() + .xrevrange(streamKey, Range.create("-", "+"), Limit.from(count)); + return toEntries(raw); + } + + public long length() { + Long n = connection.sync().xlen(streamKey); + return n == null ? 0L : n; + } + + public long trimMaxlen(long maxlen) { + Long n = connection.sync().xtrim(streamKey, + new XTrimArgs().maxlen(maxlen).approximateTrimming()); + return n == null ? 0L : n; + } + + public long trimMinid(String minid) { + Long n = connection.sync().xtrim(streamKey, + new XTrimArgs().minId(minid).approximateTrimming()); + return n == null ? 0L : n; + } + + // ------------------------------------------------------------------ + // Inspection + // ------------------------------------------------------------------ + + /** Subset of {@code XINFO STREAM} safe to JSON-encode. */ + public Map infoStream() { + Map out = new LinkedHashMap<>(); + out.put("length", 0L); + out.put("last_generated_id", null); + out.put("first_entry_id", null); + out.put("last_entry_id", null); + try { + List raw = connection.sync().xinfoStream(streamKey); + Map info = pairList(raw); + Object lengthObj = info.get("length"); + if (lengthObj instanceof Number) out.put("length", ((Number) lengthObj).longValue()); + out.put("last_generated_id", info.get("last-generated-id")); + out.put("first_entry_id", firstEntryId(info.get("first-entry"))); + out.put("last_entry_id", firstEntryId(info.get("last-entry"))); + } catch (RedisCommandExecutionException ignored) { + // Stream does not exist yet. + } + return out; + } + + /** {@code XINFO GROUPS} as a JSON-friendly list of maps. */ + public List> infoGroups() { + List> out = new ArrayList<>(); + try { + List raw = connection.sync().xinfoGroups(streamKey); + if (raw == null) return out; + for (Object groupObj : raw) { + if (!(groupObj instanceof List)) continue; + Map info = pairList((List) groupObj); + Map row = new LinkedHashMap<>(); + row.put("name", String.valueOf(info.get("name"))); + row.put("consumers", asLong(info.get("consumers"))); + row.put("pending", asLong(info.get("pending"))); + row.put("last_delivered_id", info.get("last-delivered-id")); + Object lag = info.get("lag"); + row.put("lag", lag instanceof Number ? ((Number) lag).longValue() : null); + out.add(row); + } + } catch (RedisCommandExecutionException ignored) { + } + return out; + } + + /** {@code XINFO CONSUMERS} as a JSON-friendly list of maps. */ + public List> infoConsumers(String group) { + List> out = new ArrayList<>(); + try { + List raw = connection.sync().xinfoConsumers(streamKey, group); + if (raw == null) return out; + for (Object consumerObj : raw) { + if (!(consumerObj instanceof List)) continue; + Map info = pairList((List) consumerObj); + Map row = new LinkedHashMap<>(); + row.put("name", String.valueOf(info.get("name"))); + row.put("pending", asLong(info.get("pending"))); + row.put("idle_ms", asLong(info.get("idle"))); + out.add(row); + } + } catch (RedisCommandExecutionException ignored) { + } + return out; + } + + /** Per-entry PEL view: id, consumer, idle, deliveries. */ + public List pendingDetail(String group, int count) { + List out = new ArrayList<>(); + try { + List rows = connection.sync().xpending( + streamKey, group, Range.unbounded(), Limit.from(count)); + if (rows == null) return out; + for (PendingMessage row : rows) { + out.add(new PendingEntry( + row.getId(), + row.getConsumer(), + row.getMsSinceLastDelivery(), + row.getRedeliveryCount())); + } + } catch (RedisCommandExecutionException ignored) { + } + return out; + } + + // ------------------------------------------------------------------ + // Stats and demo housekeeping + // ------------------------------------------------------------------ + + public Map stats() { + Map out = new LinkedHashMap<>(); + out.put("produced_total", producedTotal.get()); + out.put("acked_total", ackedTotal.get()); + out.put("claimed_total", claimedTotal.get()); + return out; + } + + public void resetStats() { + producedTotal.set(0); + ackedTotal.set(0); + claimedTotal.set(0); + } + + /** Drop the stream key entirely. Used by the demo's reset path. */ + public void deleteStream() { + connection.sync().del(streamKey); + } + + // ------------------------------------------------------------------ + // Helpers + // ------------------------------------------------------------------ + + private static List toEntries(List> raw) { + if (raw == null || raw.isEmpty()) return Collections.emptyList(); + List out = new ArrayList<>(raw.size()); + for (StreamMessage msg : raw) { + Map body = msg.getBody(); + out.add(new Entry(msg.getId(), body == null ? new LinkedHashMap<>() : body)); + } + return out; + } + + /** + * Convert a Redis flat alternating list (key, value, key, value, ...) + * into a Map. Used to parse XINFO replies, which Lettuce returns as + * a raw {@code List}. + */ + @SuppressWarnings("unchecked") + private static Map pairList(List flat) { + Map out = new LinkedHashMap<>(); + if (flat == null) return out; + for (int i = 0; i + 1 < flat.size(); i += 2) { + Object keyObj = flat.get(i); + if (keyObj == null) continue; + out.put(String.valueOf(keyObj), flat.get(i + 1)); + } + return out; + } + + /** + * XINFO STREAM's {@code first-entry} / {@code last-entry} comes + * back as {@code [id, [field, value, ...]]} or {@code nil}. We + * only need the ID for the demo's state view. + */ + private static String firstEntryId(Object value) { + if (!(value instanceof List)) return null; + List entry = (List) value; + if (entry.isEmpty()) return null; + Object id = entry.get(0); + return id == null ? null : String.valueOf(id); + } + + private static long asLong(Object value) { + if (value instanceof Number) return ((Number) value).longValue(); + if (value instanceof String) { + try { return Long.parseLong((String) value); } catch (NumberFormatException ignored) { return 0L; } + } + return 0L; + } +} diff --git a/content/develop/use-cases/streaming/java-lettuce/_index.md b/content/develop/use-cases/streaming/java-lettuce/_index.md new file mode 100644 index 0000000000..09f4c04d32 --- /dev/null +++ b/content/develop/use-cases/streaming/java-lettuce/_index.md @@ -0,0 +1,477 @@ +--- +categories: +- docs +- develop +- stack +- oss +- rs +- rc +description: Implement a Redis event-streaming pipeline in Java with Lettuce +linkTitle: Lettuce example (Java) +title: Redis streaming with Lettuce +weight: 5 +--- + +This guide shows you how to build a Redis-backed event-streaming pipeline in Java with the [Lettuce]({{< relref "/develop/clients/lettuce" >}}) client library. It includes a small local web server built on the JDK's `com.sun.net.httpserver` so you can produce events into a single Redis Stream, watch two independent consumer groups read it at their own pace, and recover stuck deliveries with `XAUTOCLAIM` after simulating a consumer crash. + +## Overview + +A Redis Stream is an append-only log of field/value entries with auto-generated, time-ordered IDs. Producers append with [`XADD`]({{< relref "/commands/xadd" >}}); consumers belong to *consumer groups* and read with [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}). The group as a whole tracks a single `last-delivered-id` cursor, and each consumer gets its own pending-entries list (PEL) of messages it has been handed but not yet acknowledged. Once a consumer has processed an entry it calls [`XACK`]({{< relref "/commands/xack" >}}) to clear the entry from its PEL; entries left unacknowledged past an idle threshold can be reassigned to a healthy consumer with [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}). + +That gives you: + +* Ordered, durable history that many independent consumer groups can read at their own pace +* At-least-once delivery, with per-consumer pending lists and automatic recovery of crashed consumers +* Horizontal scaling within a group — add a consumer and Redis automatically splits the work +* Replay of any range with [`XRANGE`]({{< relref "/commands/xrange" >}}), independent of consumer-group state +* Bounded retention through [`XADD MAXLEN ~`]({{< relref "/commands/xadd" >}}) or + [`XTRIM MINID ~`]({{< relref "/commands/xtrim" >}}), without a separate cleanup job + +In this example, producers append order events (`order.placed`, `order.paid`, `order.shipped`, `order.cancelled`) to a single stream at `demo:events:orders`. Two consumer groups read the same stream: + +* **`notifications`** — two consumers (`worker-a`, `worker-b`) sharing the work, modelling a fan-out worker pool. +* **`analytics`** — one consumer (`worker-c`) processing the full event flow on its own. + +## How it works + +The flow looks like this: + +1. The application calls `stream.produce(eventType, payload)` which runs [`XADD`]({{< relref "/commands/xadd" >}}) with an approximate [`MAXLEN ~`]({{< relref "/commands/xadd" >}}) cap. Redis assigns an auto-generated time-ordered ID. +2. Each `ConsumerWorker` runs a daemon thread that loops on [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}) with the special ID `>` (meaning "deliver entries this group has not yet delivered to anyone") and a short block timeout. +3. After processing each entry, the consumer calls [`XACK`]({{< relref "/commands/xack" >}}) so Redis can drop it from the group's pending list. +4. If a consumer is killed (or crashes) before acking, its entries sit in the group's PEL. A periodic [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}) sweep reassigns idle entries to a healthy consumer. +5. Anyone — including code outside the consumer groups — can read history with [`XRANGE`]({{< relref "/commands/xrange" >}}) without affecting any group's cursor. + +Each consumer group has its own cursor (`last-delivered-id`) and its own pending list, so the two groups in this demo process the same events without coordinating with each other. + +## The event-stream helper + +The `EventStream` class wraps the stream operations +([source](https://github.com/redis/docs/blob/main/content/develop/use-cases/streaming/java-lettuce/EventStream.java)): + +```java +import io.lettuce.core.RedisClient; +import io.lettuce.core.RedisURI; +import io.lettuce.core.api.StatefulRedisConnection; + +RedisClient client = RedisClient.create( + RedisURI.builder().withHost("localhost").withPort(6379).build()); +StatefulRedisConnection connection = client.connect(); + +EventStream stream = new EventStream( + connection, + "demo:events:orders", + 2000L, // approximate MAXLEN retention guardrail + 5000L); // XAUTOCLAIM idle threshold (ms) + +// Producer +Map payload = new LinkedHashMap<>(); +payload.put("order_id", "o-1234"); +payload.put("customer", "alice"); +payload.put("amount", "49.50"); +String streamId = stream.produce("order.placed", payload); + +// Consumer group + one consumer +stream.ensureGroup("notifications", "0-0"); +List entries = stream.consume( + "notifications", "worker-a", 10L, 500L); +for (EventStream.Entry entry : entries) { + handle(entry.fields); // your processing + stream.ack("notifications", List.of(entry.id)); // XACK +} + +// Recover stuck PEL entries by reaping them into a healthy consumer. +// The textbook pattern: each consumer periodically calls XAUTOCLAIM +// with itself as the target and processes whatever it claimed. +// ConsumerWorker.reapIdlePel() wraps that flow; the low-level helper +// stream.autoclaim(group, target, ...) is also available if you want +// to drive XAUTOCLAIM directly. +ConsumerWorker.ReapResult result = workerB.reapIdlePel(); +// result.claimed, result.processed, result.deletedIds +// deletedIds are PEL entries whose payload was already trimmed. +// Redis 7+ has already removed those slots from the PEL, so no XACK +// is needed — log them and route to a dead-letter store for audit. + +// Replay history (independent of any group's cursor) +for (EventStream.Entry entry : stream.replay("-", "+", 50L)) { + System.out.println(entry.id + " " + entry.fields); +} +``` + +### Data model + +Each event is a single stream entry — a flat map of field/value strings — with an auto-generated time-ordered ID: + +```text +demo:events:orders + 1716998413541-0 type=order.placed order_id=o-1234 customer=alice amount=49.50 ts_ms=... + 1716998413542-0 type=order.paid order_id=o-1234 customer=alice amount=49.50 ts_ms=... + 1716998413542-1 type=order.shipped order_id=o-1235 customer=bob amount=12.00 ts_ms=... + ... +``` + +The ID is `{milliseconds}-{sequence}`, monotonically increasing within the stream, so you can range-query by approximate wall-clock time without an extra index. (IDs are ordered within a stream, not across streams — two events appended to different streams at the same millisecond can produce the same ID.) The implementation uses: + +* [`XADD`]({{< relref "/commands/xadd" >}}) with [`XAddArgs.maxlen(n).approximateTrimming()`](https://github.com/redis/lettuce/) on every append, so the stream stays bounded as it rolls forward +* [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}) with `XReadArgs.StreamOffset.lastConsumed(...)` (the `>` offset in CLI) for fresh deliveries +* [`XACK`]({{< relref "/commands/xack" >}}) on every processed entry +* [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}) for sweeping idle pending entries to a healthy consumer +* [`XRANGE`]({{< relref "/commands/xrange" >}}) for replay and audit +* [`XPENDING`]({{< relref "/commands/xpending" >}}) for inspecting the per-group pending list +* [`XINFO STREAM`]({{< relref "/commands/xinfo-stream" >}}), + [`XINFO GROUPS`]({{< relref "/commands/xinfo-groups" >}}), and + [`XINFO CONSUMERS`]({{< relref "/commands/xinfo-consumers" >}}) for surface-level observability +* [`XTRIM`]({{< relref "/commands/xtrim" >}}) for explicit retention enforcement + +## Producing events + +`produceBatch` queues several `XADD` calls through Lettuce's async API. Each call carries an approximate `MAXLEN ~` cap so the stream stays bounded as it rolls forward: + +```java +public List produceBatch(Iterable>> events) { + RedisAsyncCommands async = connection.async(); + List> futures = new ArrayList<>(); + for (Map.Entry> event : events) { + Map fields = encodeFields(event.getKey(), event.getValue()); + XAddArgs args = XAddArgs.Builder.maxlen(maxlenApprox).approximateTrimming(); + futures.add(async.xadd(streamKey, args, fields)); + } + List ids = new ArrayList<>(futures.size()); + for (RedisFuture future : futures) { + ids.add(future.get()); + } + return ids; +} +``` + +The `~` flavour of `MAXLEN` (set with `approximateTrimming()`) lets Redis trim at a macro-node boundary, which is much cheaper than exact trimming and is what you want when the cap is a retention *guardrail*, not a hard size constraint. With 300 events produced and `MAXLEN ~ 50`, you might end up with 100 entries left — Redis released the oldest whole macro-node and stopped. The next `XADD` keeps length stable. + +If you genuinely need an exact cap (rare), call `.exactTrimming()` instead of `.approximateTrimming()`. The performance difference is significant on busy streams. + +A Lettuce-specific point on batching: the obvious "true pipeline" pattern of `setAutoFlushCommands(false)` + queue + `flushCommands()` + bulk-await is **connection-scoped** — auto-flush off on the shared connection stalls every *other* thread that is issuing sync commands on the same connection. In this demo the consumer workers' `XREADGROUP` loops use the sync API on the same connection, so flipping the connection's auto-flush flag to batch produces would freeze every consumer thread until the batch finished. The helper sidesteps that by leaving auto-flush on; Lettuce still pipelines the queued async `XADD` calls when they arrive faster than the round-trip latency. For truly large produce batches you would use a dedicated `RedisClient.connect()` for the producer and toggle `setAutoFlushCommands(false)` on *that* connection only. + +## Reading with a consumer group + +Each consumer in a group runs the same `XREADGROUP` loop. The `XReadArgs.StreamOffset.lastConsumed(streamKey)` offset is the CLI's `>` — "deliver entries this group has not yet delivered to *anyone*": + +```java +public List consume(String group, String consumer, long count, long blockMs) { + XReadArgs.StreamOffset offset = + XReadArgs.StreamOffset.lastConsumed(streamKey); + XReadArgs args = XReadArgs.Builder.count(count).block(blockMs); + List> raw = connection.sync() + .xreadgroup(Consumer.from(group, consumer), args, offset); + return toEntries(raw); +} +``` + +`blockMs` makes the call efficient even when the stream is idle: the client parks on the server until either an entry arrives or the timeout expires, so consumers don't busy-loop. + +Reading with an explicit ID like `0` (via `XReadArgs.StreamOffset.from(streamKey, "0")`) does something different — it replays entries already delivered to *this* consumer name (its private PEL). That is the canonical recovery path when the same consumer restarts: catch up on its own pending entries first, then resume reading new ones. `EventStream` exposes that as `consumeOwnPel`. + +## Acknowledging entries + +Once the consumer has processed an entry, `XACK` tells Redis it can drop the entry from the group's pending list: + +```java +public long ack(String group, Collection ids) { + if (ids == null || ids.isEmpty()) return 0L; + Long n = connection.sync().xack(streamKey, group, ids.toArray(new String[0])); + return n == null ? 0L : n; +} +``` + +This is the linchpin of at-least-once delivery: an entry that is never acked stays in the PEL until a claim moves it elsewhere. If your consumer thread crashes between processing and ack, the next claim sweep picks the entry back up. The one caveat is retention: `XADD MAXLEN ~` and `XTRIM` can release the entry's *payload* even while its ID is still in the PEL. The next `XAUTOCLAIM` returns those IDs in its deleted-IDs list and removes them from the PEL inside the same command — the entry cannot be retried, so the caller should log it and route to a dead-letter store for audit. + +The trade-off is the opposite of pub/sub: a slow or crashed consumer doesn't lose messages, but it does mean your downstream system must be idempotent. If you process an order twice because the first attempt died after the side effect but before the ack, the second attempt must be safe. + +## Multiple consumer groups, one stream + +The big difference between Redis Streams and a job queue is that any number of independent consumer groups can read the same stream. The demo sets up two groups on `demo:events:orders`: + +```java +stream.ensureGroup("notifications", "0-0"); +stream.ensureGroup("analytics", "0-0"); +``` + +Each group has its own cursor. Producing 5 events results in `notifications` and `analytics` each receiving all 5, with no coordination between them. Within `notifications`, the work is split across `worker-a` and `worker-b`: Redis hands each `XREADGROUP` call whatever entries are not yet delivered to anyone in the group, so adding a second worker doubles throughput without any rebalance logic. + +The `"0-0"` start ID means "deliver everything in the stream from the beginning" — useful in a demo and for fresh groups bootstrapping from history. In production, a brand-new group reading a long-existing stream usually starts at `$` ("only events after this point") and uses [`XRANGE`]({{< relref "/commands/xrange" >}}) explicitly if it needs history. + +## Recovering crashed consumers with XAUTOCLAIM + +The demo's "Crash next 3" button tells a chosen consumer to drop its next three deliveries on the floor without acking them — the same effect as a worker process dying mid-message. Those entries stay in the group's PEL with their delivery counter incremented. Once they have been idle for at least `claim_min_idle_ms`, any healthy consumer in the group can rescue them by calling `XAUTOCLAIM` *with itself as the target*. `ConsumerWorker.reapIdlePel` wraps that pattern +([source](https://github.com/redis/docs/blob/main/content/develop/use-cases/streaming/java-lettuce/ConsumerWorker.java)): + +```java +public ReapResult reapIdlePel() { + EventStream.AutoClaimResult result = stream.autoclaim( + group, name, 100L, "0-0", 10); + int processedThisCall = 0; + for (EventStream.Entry entry : result.claimed) { + try { + handleEntry(entry.id, entry.fields); + processedThisCall += 1; + } catch (Exception exc) { + System.err.printf("[%s/%s] reap failed on %s: %s%n", + group, name, entry.id, exc.getMessage()); + } + } + return new ReapResult( + result.claimed.size(), processedThisCall, result.deletedIds); +} +``` + +The underlying `stream.autoclaim` helper pages through the group's PEL with `XAUTOCLAIM`'s continuation cursor: + +```java +public AutoClaimResult autoclaim( + String group, String consumer, long pageCount, String startId, int maxPages) { + List claimedAll = new ArrayList<>(); + List deletedAll = new ArrayList<>(); + String cursor = startId == null ? "0-0" : startId; + String lastCursor = cursor; + for (int i = 0; i < maxPages; i++) { + AutoClaimPage page = autoclaimPage(group, consumer, pageCount, cursor); + claimedAll.addAll(page.claimed); + deletedAll.addAll(page.deletedIds); + lastCursor = page.nextCursor; + if ("0-0".equals(page.nextCursor)) break; + cursor = page.nextCursor; + } + return new AutoClaimResult(claimedAll, deletedAll, lastCursor); +} +``` + +A single `XAUTOCLAIM` call scans up to `pageCount` PEL entries starting at `startId`, reassigns the ones idle for at least `min_idle_time` to the named consumer, and returns a continuation cursor in the first slot of the reply. For a full sweep, loop until the cursor returns to `"0-0"` (with a `maxPages` safety net so one call cannot monopolise a very large PEL). The delivery counter is incremented on every claim — after a few cycles you can use it to spot a *poison-pill* message that crashes every consumer that touches it, and route it to a dead-letter stream so the bad entry stops cycling. (New entries keep flowing past the poison pill — `XREADGROUP >` still delivers fresh work — but the bad entry's repeated reclaim wastes consumer time and keeps the PEL larger than it needs to be.) + +The `deletedIds` list contains PEL entry IDs whose stream payload was already trimmed by the time the claim ran (typically because `MAXLEN ~` retention outran a slow consumer). `XAUTOCLAIM` on Redis 7+ removes those dangling slots from the PEL itself, so the caller does *not* need to `XACK` them — but the entries cannot be retried either, so log and route them to a dead-letter store for offline inspection. Redis 7.0 introduced this third return element; the example requires Redis 7.0+ for that reason. + +A Lettuce-specific quirk: Lettuce's typed `xautoclaim` method returns a `ClaimedMessages` object that only exposes the continuation cursor and the claimed messages — it does **not** surface the third (deleted-IDs) slot. To preserve the textbook `(cursor, claimed, deletedIds)` shape that the other clients return, `EventStream.autoclaim` dispatches the command itself with a `NestedMultiOutput` and parses all three slots manually. It's a small amount of boilerplate, but until Lettuce extends `ClaimedMessages` there is no clean way to read the deleted IDs through the typed helper. + +`reapIdlePel` is the right primitive for the recovery path because it claims and processes in one step: every entry the call returned is now in *this* consumer's PEL, so the same consumer is responsible for processing and acking it. In production each consumer thread runs `reapIdlePel` periodically (every few seconds, on a timer) so a crashed peer's entries never sit invisibly. The demo exposes it as a manual button so you can trigger the reap after waiting for the idle threshold. + +`XCLAIM` (singular, no auto) does the same thing for a specific list of entry IDs you already have in hand — useful when you want to take ownership of one known stuck entry, or when you need to move a specific consumer's PEL to a peer (the case the demo's "Remove consumer" button handles via `handoverPending`). `XAUTOCLAIM` cannot filter by source consumer, so it cannot be used for a per-consumer handover. + +## Replay with XRANGE + +`XRANGE` reads a slice of history. It is completely independent of any consumer group — no cursors move, no acks happen — so it is safe to call any number of times, from any process: + +```java +public List replay(String startId, String endId, long count) { + Range range = Range.create( + startId == null ? "-" : startId, + endId == null ? "+" : endId); + List> raw = connection.sync() + .xrange(streamKey, range, Limit.from(count)); + return toEntries(raw); +} +``` + +The special IDs `-` and `+` mean "from the very beginning" and "to the very end". You can also pass real IDs (`1716998413541-0`) or just the millisecond part (`1716998413541`, which Redis interprets as "any entry with this timestamp"). + +Typical uses: + +* **Bootstrapping a new projection** — read the entire stream from `-` and build a derived view in another store (a search index, a SQL table, a different cache). Doing this against a consumer group would consume the entries; `XRANGE` lets you do it without disrupting live consumers. +* **Auditing recent activity** — read the last few minutes by ID range without touching any group cursor. +* **Debugging** — fetch one specific entry by its ID, or a tight range around an incident timestamp, to see exactly what producers wrote. + +## The consumer worker thread + +`ConsumerWorker` wraps the `XREADGROUP` → process → `XACK` loop in a daemon thread: + +```java +private void run() { + while (!stopRequested) { + if (paused) { + sleepQuietly(50L); + continue; + } + List entries; + try { + entries = stream.consume(group, name, 10L, 500L); + } catch (Exception exc) { + System.err.printf("[%s/%s] read failed: %s%n", + group, name, exc.getMessage()); + sleepQuietly(500L); + continue; + } + if (entries == null || entries.isEmpty()) continue; + for (EventStream.Entry entry : entries) { + dispatch(entry.id, entry.fields); + } + } +} +``` + +`dispatch` either acks (the normal path) or, when the demo has asked the worker to "crash", drops the entry on the floor and increments a counter so the UI can show what is currently in the PEL waiting to be claimed. + +Recovery of stuck PEL entries — this consumer's, after a restart, or another consumer's, after a crash — runs through a separate `reapIdlePel` method rather than the read loop. That method calls `XAUTOCLAIM` with this consumer as the target, then processes whatever was claimed in the same flow as new entries. This is the textbook Streams pattern: each consumer is its own reaper, running `XAUTOCLAIM(self)` periodically (or on demand) so a crashed peer's entries never sit invisibly in the PEL. The demo's "XAUTOCLAIM to selected" button calls `reapIdlePel` on the chosen consumer; in production you would run it from a timer every few seconds. + +Note that the worker's main read loop deliberately does *not* read its own PEL on every iteration (the `consumeOwnPel` helper exists for the explicit recovery case). Re-delivering every pending entry on every loop iteration would *reset its idle counter to zero* each time, which would keep crashed entries below the `XAUTOCLAIM` threshold forever. Using `XAUTOCLAIM(self)` as the recovery primitive — which only fires for entries idle longer than `min_idle_time` — avoids that whole class of bug. + +The pause and crash levers exist only for the demo. A real consumer is just the read-process-ack loop — everything else in this class is instrumentation. + +## Prerequisites + +* Redis 7.0 or later. `XAUTOCLAIM` was added in Redis 6.2, but its reply gained a third element (the list of deleted IDs) in 7.0; the example relies on that shape. +* JDK 17 or later (the demo uses Java text blocks for the inlined HTML). +* The Lettuce JAR (and its Netty + Reactor dependencies) on your classpath. Get them from + [Maven Central](https://repo1.maven.org/maven2/io/lettuce/lettuce-core/), + or via Maven/Gradle in a project setup. + +If your Redis server is running elsewhere, start the demo with `--redis-host` and `--redis-port`. + +## Running the demo + +### Get the source files + +The demo consists of three Java files. Download them from the [`java-lettuce` source folder](https://github.com/redis/docs/tree/main/content/develop/use-cases/streaming/java-lettuce) on GitHub, or grab them with `curl`: + +```bash +mkdir streaming-demo && cd streaming-demo +BASE=https://raw.githubusercontent.com/redis/docs/main/content/develop/use-cases/streaming/java-lettuce +curl -O $BASE/EventStream.java +curl -O $BASE/ConsumerWorker.java +curl -O $BASE/DemoServer.java +``` + +You also need Lettuce and its runtime dependencies on your classpath. The simplest way is to download them into a local `lib/` directory: + +```bash +mkdir lib && cd lib +LETTUCE=https://repo1.maven.org/maven2/io/lettuce/lettuce-core/6.5.0.RELEASE +curl -O $LETTUCE/lettuce-core-6.5.0.RELEASE.jar +NETTY=https://repo1.maven.org/maven2/io/netty +for ARTIFACT in netty-buffer netty-codec netty-common netty-handler \ + netty-resolver netty-transport netty-transport-native-unix-common; do + curl -O "$NETTY/$ARTIFACT/4.1.113.Final/$ARTIFACT-4.1.113.Final.jar" +done +curl -O https://repo1.maven.org/maven2/io/projectreactor/reactor-core/3.6.6/reactor-core-3.6.6.jar +curl -O https://repo1.maven.org/maven2/org/reactivestreams/reactive-streams/1.0.4/reactive-streams-1.0.4.jar +cd .. +``` + +### Start the demo server + +From the demo directory: + +```bash +javac -cp 'lib/*' EventStream.java ConsumerWorker.java DemoServer.java +java -cp '.:lib/*' DemoServer +``` + +You should see: + +```text +Deleting any existing data at key 'demo:events:orders' for a clean demo run (pass --no-reset to keep it). +Redis streaming demo server listening on http://127.0.0.1:8784 +Using Redis at localhost:6379 with stream key 'demo:events:orders' (MAXLEN ~ 2000) +Seeded 3 consumer(s) across 2 group(s) +``` + +By default the demo wipes the configured stream key on startup so each run starts from a clean state. Pass `--no-reset` to keep any existing data at the key (useful when re-running against the same stream to inspect prior state), or `--stream-key ` to point the demo at a different key entirely. Other supported flags: `--port`, `--redis-host`, `--redis-port`, `--maxlen`, `--claim-idle-ms`. + +Open [http://127.0.0.1:8784](http://127.0.0.1:8784) in a browser. You can: + +* **Produce** any number of events of a chosen type (or random types). Watch the stream length grow and the tail update. +* See each **consumer group**: its `last-delivered-id`, the size of its pending list, and the consumers in it. Each consumer shows its processed count, pending count, and idle time. +* **Add or remove** consumers within a group at runtime to see Redis split the work across the new shape. +* Click **Crash next 3** on a consumer to drop its next three deliveries — the same effect as a worker process dying after `XREADGROUP` but before `XACK`. Watch the **Pending entries (XPENDING)** panel fill up. +* Wait until the idle time exceeds the threshold (default 5000 ms), pick a healthy target consumer, and click **XAUTOCLAIM to selected** — the stuck entries are reassigned and the delivery counter increments. +* **Replay (XRANGE)** any range to confirm the full history is independent of consumer-group state. +* **XTRIM** with an approximate `MAXLEN` to bound retention. Note that an approximate trim only releases whole macro-nodes — `MAXLEN ~ 50` on a small stream may not delete anything; on a 300-entry stream it typically lands at around 100. +* Click **Reset demo** to drop the stream and re-seed the default groups. + +## Production usage + +### Pick retention by length or by minimum ID + +The demo uses `MAXLEN ~` on every `XADD`. Two alternatives are worth considering: + +* `MINID ~ ` — keep only entries newer than an ID. If you want "the last 24 hours", compute the wall-clock cutoff and call `stream.trimMinid(...)` (or `XTRIM MINID ~ -0`). This is the right pattern when retention is time-bounded. +* No cap on `XADD` plus a periodic `XTRIM` job — useful if your producer is hot and the per-`XADD` work has to stay minimal, or if retention rules are complex (a separate process can also factor in consumer-group lag). + +In all three cases the trimming is approximate by default. Use exact trimming (`.exactTrimming()`) only when you genuinely need an exact count. + +### Don't let consumer-group lag silently grow + +`XINFO GROUPS` reports each group's `lag` (entries the group has not yet read) and `pending` (entries delivered but not acked). In production, alert on either of these crossing a threshold — a steadily growing pending count usually means consumers are crashing without `XAUTOCLAIM` running, and a growing lag means consumers can't keep up with producers. + +The same applies inside a group: `XINFO CONSUMERS` reports per-consumer pending counts and idle times, so you can spot one slow consumer holding entries that the rest of the group is waiting on. + +### Make consumer logic idempotent + +`XAUTOCLAIM` can re-deliver an entry to a different consumer after a crash. If your processing has side effects (sending email, charging a card, updating a downstream store), make sure the same entry processed twice gives the same result — use an idempotency key, an upsert with conditional check, or a once-per-id guard table. Redis Streams cannot give you exactly-once semantics on its own. + +### Bound the delivery counter as a poison-pill signal + +`XPENDING` returns each entry's delivery count, incremented on every claim. If an entry has been delivered (and dropped) several times, the next consumer is unlikely to fare better. After some threshold — `deliveries >= 5`, say — route the entry to a *dead-letter stream*, ack it on the original group, and alert. New entries keep flowing past a poison pill (`XREADGROUP >` still delivers fresh work), but the bad entry's repeated reclaim wastes consumer time and keeps the PEL bigger than it needs to be — without a DLQ threshold it can also slowly trip retention/lag alerts. + +### Partition by tenant or entity for scale + +A single Redis Stream is a single key, and on a Redis Cluster a single key lives on a single shard. If your throughput exceeds what one shard can handle, partition the stream — for example by tenant ID (`events:orders:{tenant_a}`, `events:orders:{tenant_b}`) — so different tenants land on different shards. Hash-tags (`{tenant_a}`) keep all related streams on the same shard if you need to multi-stream atomically. + +Per-entity partitioning (`events:order:{order_id}`) is the canonical pattern when you treat each entity's stream as the event-sourcing log for that entity: every state change for one order goes on its own stream, which is also bounded in size by the entity's lifetime. + +### Use a separate consumer pool per group + +The demo runs every consumer in one process. In production each consumer group is usually its own deployment — its own pool of pods or VMs — so a slow projection in `analytics` cannot pull `notifications` workers off their stream. Each pod runs one consumer thread per CPU core, with `XAUTOCLAIM` either embedded in the consumer loop (every N reads, claim idle entries to self) or run by a separate reaper. + +### Don't read with XREAD (no group) and then try to ack + +`XREAD` and `XREADGROUP` are different mechanisms. `XREAD` is a tail-the-log read with no consumer-group state — entries are not added to any PEL, and you cannot `XACK` them. If you want at-least-once delivery and crash recovery, you must read through a consumer group. + +`XREAD` is still useful for read-only tail clients (a UI streaming events, a debugger, a `tail -f`-style command-line tool). It's just not part of the at-least-once path. + +### Use a dedicated connection (or a pool) for batched produces + +The demo shares one `StatefulRedisConnection` across HTTP handlers and consumer threads. Lettuce is thread-safe for individual commands, but `setAutoFlushCommands(false)` is **connection-wide**: turning auto-flush off so an async batch can flush in one round trip would stall every other thread that is mid-flight on the same connection — and sync calls on those threads would silently hang until something else triggered a flush. The helper's `produceBatch` therefore leaves auto-flush on and relies on Lettuce's natural pipelining of arrival-rate-faster-than-RTT writes. For large produce batches use [`ConnectionPoolSupport`](https://github.com/redis/lettuce/wiki/Connection-Pooling) (or a second `RedisClient.connect()` dedicated to the producer) and toggle `setAutoFlushCommands(false)` on the dedicated connection only. + +If you mix `setAutoFlushCommands(false)` with the **sync** API on the same connection, the call deadlocks silently — each sync call awaits its own future, and with auto-flush off those futures never complete. Always use the async API (`connection.async()`) inside the auto-flush-off window and await the futures in bulk; restore auto-flush to `true` in a `finally` block. + +### Inspect the stream directly with redis-cli + +When testing or troubleshooting, inspect the stream directly to confirm the consumer state is what you expect: + +```bash +# Stream summary +redis-cli XLEN demo:events:orders +redis-cli XINFO STREAM demo:events:orders + +# Group cursors and pending counts +redis-cli XINFO GROUPS demo:events:orders + +# Consumers within a group +redis-cli XINFO CONSUMERS demo:events:orders notifications + +# Pending entries with idle time and delivery count +redis-cli XPENDING demo:events:orders notifications - + 20 + +# Tail the stream live (no consumer-group state — like tail -f) +redis-cli XREAD BLOCK 0 STREAMS demo:events:orders '$' + +# Replay a range +redis-cli XRANGE demo:events:orders - + COUNT 50 +``` + +If a group's `lag` is growing while consumers' `idle` times are short, consumers are healthy but producers are outpacing them — add more consumers. If `pending` is growing while `lag` is small, consumers are *receiving* entries but not *acking* them — either they are crashing mid-message or your acking logic has a bug. + +## Learn more + +This example uses the following Redis commands: + +* [`XADD`]({{< relref "/commands/xadd" >}}) to append an event with an approximate `MAXLEN` cap. +* [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}) to read new entries for a consumer in a group. +* [`XACK`]({{< relref "/commands/xack" >}}) to acknowledge a processed entry. +* [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}) to reassign idle pending entries to a healthy consumer. +* [`XRANGE`]({{< relref "/commands/xrange" >}}) for replay and audit, independent of consumer-group state. +* [`XPENDING`]({{< relref "/commands/xpending" >}}) to inspect the per-group pending list with idle times and delivery counts. +* [`XTRIM`]({{< relref "/commands/xtrim" >}}) for explicit retention enforcement. +* [`XGROUP CREATE`]({{< relref "/commands/xgroup-create" >}}) and + [`XGROUP DELCONSUMER`]({{< relref "/commands/xgroup-delconsumer" >}}) to manage groups and consumers. +* [`XINFO STREAM`]({{< relref "/commands/xinfo-stream" >}}), + [`XINFO GROUPS`]({{< relref "/commands/xinfo-groups" >}}), and + [`XINFO CONSUMERS`]({{< relref "/commands/xinfo-consumers" >}}) for observability. + +See the [Lettuce guide]({{< relref "/develop/clients/lettuce" >}}) for the full client reference, and the [Streams overview]({{< relref "/develop/data-types/streams" >}}) for the deeper conceptual model — consumer groups, the PEL, claim semantics, capped streams, and the differences with Kafka partitions. diff --git a/content/develop/use-cases/streaming/nodejs/_index.md b/content/develop/use-cases/streaming/nodejs/_index.md new file mode 100644 index 0000000000..8b7abc36d0 --- /dev/null +++ b/content/develop/use-cases/streaming/nodejs/_index.md @@ -0,0 +1,494 @@ +--- +categories: +- docs +- develop +- stack +- oss +- rs +- rc +description: Implement a Redis event-streaming pipeline in Node.js with node-redis +linkTitle: node-redis example (Node.js) +title: Redis streaming with node-redis +weight: 2 +--- + +This guide shows you how to build a Redis-backed event-streaming pipeline in Node.js with [`node-redis`]({{< relref "/develop/clients/nodejs" >}}). It includes a small local web server built with the Node.js standard `http` module so you can produce events into a single Redis Stream, watch two independent consumer groups read it at their own pace, and recover stuck deliveries with `XAUTOCLAIM` after simulating a consumer crash. + +## Overview + +A Redis Stream is an append-only log of field/value entries with auto-generated, time-ordered IDs. Producers append with [`XADD`]({{< relref "/commands/xadd" >}}); consumers belong to *consumer groups* and read with [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}). The group as a whole tracks a single `last-delivered-id` cursor, and each consumer gets its own pending-entries list (PEL) of messages it has been handed but not yet acknowledged. Once a consumer has processed an entry it calls [`XACK`]({{< relref "/commands/xack" >}}) to clear the entry from its PEL; entries left unacknowledged past an idle threshold can be reassigned to a healthy consumer with [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}). + +That gives you: + +* Ordered, durable history that many independent consumer groups can read at their own pace +* At-least-once delivery, with per-consumer pending lists and automatic recovery of crashed consumers +* Horizontal scaling within a group — add a consumer and Redis automatically splits the work +* Replay of any range with [`XRANGE`]({{< relref "/commands/xrange" >}}), independent of consumer-group state +* Bounded retention through [`XADD MAXLEN ~`]({{< relref "/commands/xadd" >}}) or + [`XTRIM MINID ~`]({{< relref "/commands/xtrim" >}}), without a separate cleanup job + +In this example, producers append order events (`order.placed`, `order.paid`, `order.shipped`, `order.cancelled`) to a single stream at `demo:events:orders`. Two consumer groups read the same stream: + +* **`notifications`** — two consumers (`worker-a`, `worker-b`) sharing the work, modelling a fan-out worker pool. +* **`analytics`** — one consumer (`worker-c`) processing the full event flow on its own. + +## How it works + +The flow looks like this: + +1. The application calls `stream.produce(eventType, payload)` which runs [`XADD`]({{< relref "/commands/xadd" >}}) with an approximate [`MAXLEN ~`]({{< relref "/commands/xadd" >}}) cap. Redis assigns an auto-generated time-ordered ID. +2. Each consumer's async loop calls [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}) with the special ID `>` (meaning "deliver entries this group has not yet delivered to anyone") and a short block timeout. +3. After processing each entry, the consumer calls [`XACK`]({{< relref "/commands/xack" >}}) so Redis can drop it from the group's pending list. +4. If a consumer is killed (or crashes) before acking, its entries sit in the group's PEL. A periodic [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}) sweep reassigns idle entries to a healthy consumer. +5. Anyone — including code outside the consumer groups — can read history with [`XRANGE`]({{< relref "/commands/xrange" >}}) without affecting any group's cursor. + +Each consumer group has its own cursor (`last-delivered-id`) and its own pending list, so the two groups in this demo process the same events without coordinating with each other. + +## The event-stream helper + +The `EventStream` class wraps the stream operations +([source](https://github.com/redis/docs/blob/main/content/develop/use-cases/streaming/nodejs/eventStream.js)): + +```javascript +const { createClient } = require("redis"); +const { EventStream } = require("./eventStream"); + +const client = createClient({ socket: { host: "localhost", port: 6379 } }); +await client.connect(); + +const stream = new EventStream({ + redisClient: client, + streamKey: "demo:events:orders", + maxlenApprox: 2000, // retention guardrail + claimMinIdleMs: 5000, // XAUTOCLAIM threshold +}); + +// Producer +const streamId = await stream.produce("order.placed", { + order_id: "o-1234", + customer: "alice", + amount: "49.50", +}); + +// Consumer group + one consumer +await stream.ensureGroup("notifications", "0-0"); +const entries = await stream.consume("notifications", "worker-a", 10, 500); +for (const [entryId, fields] of entries) { + handle(fields); // your processing + await stream.ack("notifications", [entryId]); // XACK +} + +// Recover stuck PEL entries by reaping them into a healthy consumer. +// The textbook pattern: each consumer periodically calls XAUTOCLAIM +// with itself as the target and processes whatever it claimed. +// `ConsumerWorker.reapIdlePel` wraps that flow; the low-level helper +// `stream.autoclaim(group, targetName)` is also available if you +// want to drive XAUTOCLAIM directly. +const result = await workerB.reapIdlePel(); +// result === { claimed: N, processed: M, deletedIds: [...] } +// deletedIds are PEL entries whose payload was already trimmed. +// Redis 7+ has already removed those slots from the PEL, so no XACK +// is needed — log them and route to a dead-letter store for audit. + +// Replay history (independent of any group's cursor) +for (const [entryId, fields] of await stream.replay("-", "+", 50)) { + console.log(entryId, fields); +} +``` + +### Data model + +Each event is a single stream entry — a flat object of field/value strings — with an auto-generated time-ordered ID: + +```text +demo:events:orders + 1716998413541-0 type=order.placed order_id=o-1234 customer=alice amount=49.50 ts_ms=... + 1716998413542-0 type=order.paid order_id=o-1234 customer=alice amount=49.50 ts_ms=... + 1716998413542-1 type=order.shipped order_id=o-1235 customer=bob amount=12.00 ts_ms=... + ... +``` + +The ID is `{milliseconds}-{sequence}`, monotonically increasing within the stream, so you can range-query by approximate wall-clock time without an extra index. (IDs are ordered within a stream, not across streams — two events appended to different streams at the same millisecond can produce the same ID.) The implementation uses: + +* [`XADD ... MAXLEN ~ n`]({{< relref "/commands/xadd" >}}), pipelined, for batch production with a retention cap +* [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}) with the special ID `>` for fresh deliveries to a consumer +* [`XACK`]({{< relref "/commands/xack" >}}) on every processed entry +* [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}) for sweeping idle pending entries to a healthy consumer +* [`XRANGE`]({{< relref "/commands/xrange" >}}) for replay and audit +* [`XPENDING`]({{< relref "/commands/xpending" >}}) for inspecting the per-group pending list +* [`XINFO STREAM`]({{< relref "/commands/xinfo-stream" >}}), + [`XINFO GROUPS`]({{< relref "/commands/xinfo-groups" >}}), and + [`XINFO CONSUMERS`]({{< relref "/commands/xinfo-consumers" >}}) for surface-level observability +* [`XTRIM`]({{< relref "/commands/xtrim" >}}) for explicit retention enforcement + +## Producing events + +`produceBatch` pipelines `XADD` calls in a single round trip. Each call carries an approximate `MAXLEN ~` cap so the stream stays bounded as it rolls forward: + +```javascript +async produceBatch(events) { + const list = Array.from(events); + if (list.length === 0) return []; + const pipe = this.redis.multi(); + for (const [eventType, payload] of list) { + const fields = EventStream._encodeFields(eventType, payload); + pipe.xAdd(this.streamKey, "*", fields, { + TRIM: { + strategy: "MAXLEN", + strategyModifier: "~", + threshold: this.maxlenApprox, + }, + }); + } + // execAsPipeline sends the commands in one round trip without + // wrapping them in MULTI/EXEC. + const ids = await pipe.execAsPipeline(); + this._producedTotal += ids.length; + return ids.map((id) => String(id)); +} +``` + +The `~` flavour of `MAXLEN` lets Redis trim at a macro-node boundary, which is much cheaper than exact trimming and is what you want when the cap is a retention *guardrail*, not a hard size constraint. With 300 events produced and `MAXLEN ~ 50`, you might end up with 100 entries left — Redis released the oldest whole macro-node and stopped. The next `XADD` will keep length stable. + +If you genuinely need an exact cap (rare), use `strategyModifier: "="` instead of `"~"`. The performance difference is significant on busy streams. + +## Reading with a consumer group + +Each consumer in a group runs the same `XREADGROUP` loop. The special ID `>` means "deliver entries this group has not yet delivered to *anyone*": + +```javascript +async consume(group, consumer, count = 10, blockMs = 500) { + const raw = await this.redis.xReadGroup( + group, + consumer, + [{ key: this.streamKey, id: ">" }], + { COUNT: count, BLOCK: blockMs }, + ); + return flattenEntries(raw); +} +``` + +`blockMs` makes the call efficient even when the stream is idle: the client parks on the server until either an entry arrives or the timeout expires, so consumers don't busy-loop. Because `XREADGROUP BLOCK` ties up the underlying socket until the server replies, each consumer in this demo uses its own Redis client (a duplicated connection) so its blocking read never serialises behind another worker's read or behind an HTTP-handler command. + +Reading with an explicit ID like `0-0` instead of `>` does something different — it replays entries already delivered to *this* consumer name (its private PEL). That is the canonical recovery path when the same consumer restarts: catch up on its own pending entries first, then resume reading new ones. + +## Acknowledging entries + +Once the consumer has processed an entry, `XACK` tells Redis it can drop the entry from the group's pending list: + +```javascript +async ack(group, ids) { + const idList = Array.from(ids); + if (idList.length === 0) return 0; + const n = Number(await this.redis.xAck(this.streamKey, group, idList)); + this._ackedTotal += n; + return n; +} +``` + +This is the linchpin of at-least-once delivery: an entry that is never acked stays in the PEL until a claim moves it elsewhere. If your consumer crashes between processing and ack, the next claim sweep picks the entry back up. The one caveat is retention: `XADD MAXLEN ~` and `XTRIM` can release the entry's *payload* even while its ID is still in the PEL. The next `XAUTOCLAIM` returns those IDs in its `deletedMessages` list and removes them from the PEL inside the same command — the entry cannot be retried, so the caller should log it and route to a dead-letter store for audit. The example handles this explicitly in `autoclaim` further down. + +The trade-off is the opposite of pub/sub: a slow or crashed consumer doesn't lose messages, but it does mean your downstream system must be idempotent. If you process an order twice because the first attempt died after the side effect but before the ack, the second attempt must be safe. + +## Multiple consumer groups, one stream + +The big difference between Redis Streams and a job queue is that any number of independent consumer groups can read the same stream. The demo sets up two groups on `demo:events:orders`: + +```javascript +await stream.ensureGroup("notifications", "0-0"); +await stream.ensureGroup("analytics", "0-0"); +``` + +Each group has its own cursor. Producing 5 events results in `notifications` and `analytics` each receiving all 5, with no coordination between them. Within `notifications`, the work is split across `worker-a` and `worker-b`: Redis hands each `XREADGROUP` call whatever entries are not yet delivered to anyone in the group, so adding a second worker doubles throughput without any rebalance logic. + +The `"0-0"` second argument means "deliver everything in the stream from the beginning" — useful in a demo and for fresh groups bootstrapping from history. In production, a brand-new group reading a long-existing stream usually starts at `"$"` ("only events after this point") and uses [`XRANGE`]({{< relref "/commands/xrange" >}}) explicitly if it needs history. + +## Recovering crashed consumers with XAUTOCLAIM + +The demo's "Crash next 3" button tells a chosen consumer to drop its next three deliveries on the floor without acking them — the same effect as a worker process dying mid-message. Those entries stay in the group's PEL with their delivery counter incremented. Once they have been idle for at least `claimMinIdleMs`, any healthy consumer in the group can rescue them by calling `XAUTOCLAIM` *with itself as the target*. `ConsumerWorker.reapIdlePel` wraps that pattern: + +```javascript +async reapIdlePel() { + const { claimed, deletedIds } = await this.stream.autoclaim( + this.group, + this.name, + { pageCount: 100, maxPages: 10 }, + ); + let processed = 0; + for (const [entryId, fields] of claimed) { + try { + if (this.processLatencyMs) { + await sleep(this.processLatencyMs); + } + await this._handleEntry(entryId, fields); + processed += 1; + } catch (err) { + console.error(`reap failed on ${entryId}: ${err.message}`); + } + } + this._reaped += processed; + return { claimed: claimed.length, deletedIds, processed }; +} +``` + +The underlying `stream.autoclaim` helper pages through the group's PEL with `XAUTOCLAIM`'s continuation cursor: + +```javascript +async autoclaim(group, consumer, options = {}) { + const { pageCount = 100, startId = "0-0", maxPages = 10 } = options; + const claimedAll = []; + const deletedAll = []; + let cursor = startId; + for (let i = 0; i < maxPages; i += 1) { + const reply = await this.redis.xAutoClaim( + this.streamKey, group, consumer, + this.claimMinIdleMs, cursor, { COUNT: pageCount }, + ); + for (const entry of reply.messages || []) { + const tuple = toTuple(entry); + if (tuple) claimedAll.push(tuple); + } + for (const id of reply.deletedMessages || []) { + deletedAll.push(String(id)); + } + const nextId = String(reply.nextId || "0-0"); + if (nextId === "0-0") break; + cursor = nextId; + } + this._claimedTotal += claimedAll.length; + return { claimed: claimedAll, deletedIds: deletedAll }; +} +``` + +A single `XAUTOCLAIM` call scans up to `pageCount` PEL entries starting at `startId`, reassigns the ones idle for at least `claimMinIdleMs` to the named consumer, and returns a continuation cursor in `reply.nextId`. For a full sweep, loop until the cursor returns to `"0-0"` (with a `maxPages` safety net so one call cannot monopolise a very large PEL). The delivery counter is incremented on every claim — after a few cycles you can use it to spot a *poison-pill* message that crashes every consumer that touches it, and route it to a dead-letter stream so the bad entry stops cycling. (New entries keep flowing past the poison pill — `XREADGROUP >` still delivers fresh work — but the bad entry's repeated reclaim wastes consumer time and keeps the PEL larger than it needs to be.) + +The `deletedMessages` list contains PEL entry IDs whose stream payload was already trimmed by the time the claim ran (typically because `MAXLEN ~` retention outran a slow consumer). `XAUTOCLAIM` removes those dangling slots from the PEL itself, so the caller does *not* need to `XACK` them — but the entries cannot be retried either, so log and route them to a dead-letter store for offline inspection. Redis 7.0 introduced this third return element; the example requires Redis 7.0+ for that reason. + +`reapIdlePel` is the right primitive for the recovery path because it claims and processes in one step: every entry the call returned is now in *this* consumer's PEL, so the same consumer is responsible for processing and acking it. In production each consumer runs `reapIdlePel` periodically (every few seconds, on a timer) so a crashed peer's entries never sit invisibly. The demo exposes it as a manual button so you can trigger the reap after waiting for the idle threshold. + +`XCLAIM` (singular, no auto) does the same thing for a specific list of entry IDs you already have in hand — useful when you want to take ownership of one known stuck entry, or when you need to move a specific consumer's PEL to a peer (the case the demo's "Remove consumer" button handles via `handoverPending`). `XAUTOCLAIM` cannot filter by source consumer, so it cannot be used for a per-consumer handover. + +## Replay with XRANGE + +`XRANGE` reads a slice of history. It is completely independent of any consumer group — no cursors move, no acks happen — so it is safe to call any number of times, from any process: + +```javascript +async replay(startId = "-", endId = "+", count = 100) { + const raw = await this.redis.xRange(this.streamKey, startId, endId, { + COUNT: count, + }); + const out = []; + for (const entry of raw || []) { + const tuple = toTuple(entry); + if (tuple) out.push(tuple); + } + return out; +} +``` + +The special IDs `-` and `+` mean "from the very beginning" and "to the very end". You can also pass real IDs (`1716998413541-0`) or just the millisecond part (`1716998413541`, which Redis interprets as "any entry with this timestamp"). + +Typical uses: + +* **Bootstrapping a new projection** — read the entire stream from `-` and build a derived view in another store (a search index, a SQL table, a different cache). Doing this against a consumer group would consume the entries; `XRANGE` lets you do it without disrupting live consumers. +* **Auditing recent activity** — read the last few minutes by ID range without touching any group cursor. +* **Debugging** — fetch one specific entry by its ID, or a tight range around an incident timestamp, to see exactly what producers wrote. + +## The consumer worker thread + +`ConsumerWorker` wraps the `XREADGROUP` → process → `XACK` loop in an async task +([source](https://github.com/redis/docs/blob/main/content/develop/use-cases/streaming/nodejs/consumerWorker.js)): + +```javascript +async _run() { + while (!this._stopped) { + if (this._paused) { + await sleep(50); + continue; + } + + let entries; + try { + // Use the dedicated blocking client so the shared client stays + // free for HTTP-handler commands. + entries = await this.blockingStream.consume( + this.group, this.name, 10, 500, + ); + } catch (err) { + console.error(`[${this.group}/${this.name}] read failed: ${err.message}`); + await sleep(500); + continue; + } + + for (const [entryId, fields] of entries) { + if (this._stopped) break; + await this._dispatch(entryId, fields); + } + } + this._runPromise = null; +} +``` + +`_dispatch` wraps the per-entry handling in a try/catch so a failure (typically `XACK` against Redis) doesn't kill the loop. The actual handler either acks (the normal path) or, when the demo has asked the worker to "crash", drops the entry on the floor and increments a counter so the UI can show what is currently in the PEL waiting to be claimed. + +Node.js is single-threaded for JavaScript execution, so the worker "thread" is really an async task on the event loop. That's fine for the demo — concurrent `XREADGROUP` calls from multiple workers run interleaved, each parked on its own Redis client until either a message arrives or the BLOCK timeout expires. Two architectural choices follow from the single connection-per-blocking-call model: + +* Each worker owns its **own duplicated Redis client** dedicated to `XREADGROUP BLOCK`. node-redis serialises commands on a single connection, so sharing one client across workers (or with the HTTP handlers) would serialise their reads through that socket. +* All other commands (`XACK`, `XAUTOCLAIM`, `XCLAIM`, `XPENDING`, `XADD`) flow through a shared `EventStream` so the produced/acked/claimed counters aggregate across workers. + +Recovery of stuck PEL entries — this consumer's, after a restart, or another consumer's, after a crash — runs through a separate `reapIdlePel` method rather than the read loop. That method calls `XAUTOCLAIM` with this consumer as the target, then processes whatever was claimed in the same flow as new entries. This is the textbook Streams pattern: each consumer is its own reaper, running `XAUTOCLAIM(self)` periodically (or on demand) so a crashed peer's entries never sit invisibly in the PEL. The demo's "XAUTOCLAIM to selected" button calls `reapIdlePel` on the chosen consumer; in production you would run it from a timer every few seconds. + +Note that the worker's main read loop deliberately does *not* call `XREADGROUP 0` to drain its own PEL on every iteration. That would re-deliver every pending entry continuously and *reset its idle counter to zero* each time, which would keep crashed entries below the `XAUTOCLAIM` threshold forever. Using `XAUTOCLAIM(self)` as the recovery primitive — which only fires for entries idle longer than `claimMinIdleMs` — avoids that whole class of bug. + +The pause and crash levers exist only for the demo. A real consumer is just the read-process-ack loop — everything else in this class is instrumentation. + +## Prerequisites + +* Redis 7.0 or later. `XAUTOCLAIM` was added in Redis 6.2, but its reply gained a third + element (the list of deleted IDs) in 7.0; the example relies on that shape. +* Node.js 18 or later. +* The `node-redis` 5.x client. The demo's `package.json` declares it as a dependency: + + ```bash + npm install + ``` + +If your Redis server is running elsewhere, start the demo with `--redis-host` and `--redis-port`. + +## Running the demo + +### Get the source files + +The demo consists of three JavaScript files plus `package.json`. Download them from the [`nodejs` source folder](https://github.com/redis/docs/tree/main/content/develop/use-cases/streaming/nodejs) on GitHub, or grab them with `curl`: + +```bash +mkdir streaming-demo && cd streaming-demo +BASE=https://raw.githubusercontent.com/redis/docs/main/content/develop/use-cases/streaming/nodejs +curl -O $BASE/package.json +curl -O $BASE/eventStream.js +curl -O $BASE/consumerWorker.js +curl -O $BASE/demoServer.js +``` + +### Start the demo server + +From that directory: + +```bash +npm install +node demoServer.js +``` + +You should see: + +```text +Deleting any existing data at key 'demo:events:orders' for a clean demo run (pass --no-reset to keep it). +Redis streaming demo server listening on http://127.0.0.1:8083 +Using Redis at localhost:6379 with stream key 'demo:events:orders' (MAXLEN ~ 2000) +Seeded 3 consumer(s) across 2 group(s) +``` + +By default the demo wipes the configured stream key on startup so each run starts from a clean state. Pass `--no-reset` to keep any existing data at the key (useful when re-running against the same stream to inspect prior state), or `--stream-key ` to point the demo at a different key entirely. + +Open [http://127.0.0.1:8083](http://127.0.0.1:8083) in a browser. You can: + +* **Produce** any number of events of a chosen type (or random types). Watch the stream length grow and the tail update. +* See each **consumer group**: its `last-delivered-id`, the size of its pending list, and the consumers in it. Each consumer shows its processed count, pending count, and idle time. +* **Add or remove** consumers within a group at runtime to see Redis split the work across the new shape. +* Click **Crash next 3** on a consumer to drop its next three deliveries — the same effect as a worker process dying after `XREADGROUP` but before `XACK`. Watch the **Pending entries (XPENDING)** panel fill up. +* Wait until the idle time exceeds the threshold (default 5000 ms), pick a healthy target consumer, and click **XAUTOCLAIM to selected** — the stuck entries are reassigned and the delivery counter increments. +* **Replay (XRANGE)** any range to confirm the full history is independent of consumer-group state. +* **XTRIM** with an approximate `MAXLEN` to bound retention. Note that an approximate trim only releases whole macro-nodes — `MAXLEN ~ 50` on a small stream may not delete anything; on a 300-entry stream it typically lands at around 100. +* Click **Reset demo** to drop the stream and re-seed the default groups. + +## Production usage + +### Pick retention by length or by minimum ID + +The demo uses `MAXLEN ~` on every `XADD`. Two alternatives are worth considering: + +* `MINID ~ ` — keep only entries newer than an ID. If you want "the last 24 hours", compute the wall-clock cutoff and pass `XTRIM MINID ~ -0`. This is the right pattern when retention is time-bounded. +* No cap on `XADD` plus a periodic `XTRIM` job — useful if your producer is hot and the per-`XADD` work has to stay minimal, or if retention rules are complex (a separate process can also factor in consumer-group lag). + +In all three cases the trimming is approximate by default. Use exact trimming (`MAXLEN n` or `MINID id` without `~`) only when you genuinely need an exact count. + +### Don't let consumer-group lag silently grow + +`XINFO GROUPS` reports each group's `lag` (entries the group has not yet read) and `pending` (entries delivered but not acked). In production, alert on either of these crossing a threshold — a steadily growing pending count usually means consumers are crashing without `XAUTOCLAIM` running, and a growing lag means consumers can't keep up with producers. + +The same applies inside a group: `XINFO CONSUMERS` reports per-consumer pending counts and idle times, so you can spot one slow consumer holding entries that the rest of the group is waiting on. + +### Make consumer logic idempotent + +`XAUTOCLAIM` can re-deliver an entry to a different consumer after a crash. If your processing has side effects (sending email, charging a card, updating a downstream store), make sure the same entry processed twice gives the same result — use an idempotency key, an upsert with conditional check, or a once-per-id guard table. Redis Streams cannot give you exactly-once semantics on its own. + +### Bound the delivery counter as a poison-pill signal + +`XPENDING` returns each entry's delivery count, incremented on every claim. If an entry has been delivered (and dropped) several times, the next consumer is unlikely to fare better. After some threshold — `deliveries >= 5`, say — route the entry to a *dead-letter stream*, ack it on the original group, and alert. New entries keep flowing past a poison pill (`XREADGROUP >` still delivers fresh work), but the bad entry's repeated reclaim wastes consumer time and keeps the PEL bigger than it needs to be — without a DLQ threshold it can also slowly trip retention/lag alerts. + +### Partition by tenant or entity for scale + +A single Redis Stream is a single key, and on a Redis Cluster a single key lives on a single shard. If your throughput exceeds what one shard can handle, partition the stream — for example by tenant ID (`events:orders:{tenant_a}`, `events:orders:{tenant_b}`) — so different tenants land on different shards. Hash-tags (`{tenant_a}`) keep all related streams on the same shard if you need to multi-stream atomically. + +Per-entity partitioning (`events:order:{order_id}`) is the canonical pattern when you treat each entity's stream as the event-sourcing log for that entity: every state change for one order goes on its own stream, which is also bounded in size by the entity's lifetime. + +### Use a separate consumer pool per group + +The demo runs every consumer in one Node.js process. In production each consumer group is usually its own deployment — its own pool of pods or VMs — so a slow projection in `analytics` cannot pull `notifications` workers off their stream. Each pod runs one consumer per CPU core (or as many async loops as you can fit on the event loop without throughput-limiting on per-entry processing), with `XAUTOCLAIM` either embedded in the consumer loop (every N reads, claim idle entries to self) or run by a separate reaper. + +### One Redis client per blocking `XREADGROUP` call + +`XREADGROUP BLOCK` parks the underlying connection until either an entry arrives or the BLOCK timeout expires. node-redis sends commands on a single TCP connection in arrival order, so sharing one client across multiple consumers (or with HTTP-handler commands) serialises every command behind any in-flight blocking read. The demo gives each worker its own duplicated client for the blocking read, and routes `XACK` / `XAUTOCLAIM` / `XCLAIM` / `XADD` / inspection commands through a shared client. The same pattern (or a connection pool that hands out one connection per blocking call) applies to any production deployment. + +### Don't read with XREAD (no group) and then try to ack + +`XREAD` and `XREADGROUP` are different mechanisms. `XREAD` is a tail-the-log read with no consumer-group state — entries are not added to any PEL, and you cannot `XACK` them. If you want at-least-once delivery and crash recovery, you must read through a consumer group. + +`XREAD` is still useful for read-only tail clients (a UI streaming events, a debugger, a `tail -f`-style command-line tool). It's just not part of the at-least-once path. + +### Inspect the stream directly with redis-cli + +When testing or troubleshooting, inspect the stream directly to confirm the consumer state is what you expect: + +```bash +# Stream summary +redis-cli XLEN demo:events:orders +redis-cli XINFO STREAM demo:events:orders + +# Group cursors and pending counts +redis-cli XINFO GROUPS demo:events:orders + +# Consumers within a group +redis-cli XINFO CONSUMERS demo:events:orders notifications + +# Pending entries with idle time and delivery count +redis-cli XPENDING demo:events:orders notifications - + 20 + +# Tail the stream live (no consumer-group state — like tail -f) +redis-cli XREAD BLOCK 0 STREAMS demo:events:orders '$' + +# Replay a range +redis-cli XRANGE demo:events:orders - + COUNT 50 +``` + +If a group's `lag` is growing while consumers' `idle` times are short, consumers are healthy but producers are outpacing them — add more consumers. If `pending` is growing while `lag` is small, consumers are *receiving* entries but not *acking* them — either they are crashing mid-message or your acking logic has a bug. + +## Learn more + +This example uses the following Redis commands: + +* [`XADD`]({{< relref "/commands/xadd" >}}) to append an event with an approximate `MAXLEN` cap. +* [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}) to read new entries for a consumer in a group. +* [`XACK`]({{< relref "/commands/xack" >}}) to acknowledge a processed entry. +* [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}) to reassign idle pending entries to a healthy consumer. +* [`XCLAIM`]({{< relref "/commands/xclaim" >}}) to take ownership of a specific list of pending entry IDs by hand (used by `handoverPending` to move a leaving consumer's PEL to a peer, since `XAUTOCLAIM` has no source-consumer filter). +* [`XRANGE`]({{< relref "/commands/xrange" >}}) for replay and audit, independent of consumer-group state. +* [`XPENDING`]({{< relref "/commands/xpending" >}}) to inspect the per-group pending list with idle times and delivery counts. +* [`XTRIM`]({{< relref "/commands/xtrim" >}}) for explicit retention enforcement. +* [`XGROUP CREATE`]({{< relref "/commands/xgroup-create" >}}) and + [`XGROUP DELCONSUMER`]({{< relref "/commands/xgroup-delconsumer" >}}) to manage groups and consumers. +* [`XINFO STREAM`]({{< relref "/commands/xinfo-stream" >}}), + [`XINFO GROUPS`]({{< relref "/commands/xinfo-groups" >}}), and + [`XINFO CONSUMERS`]({{< relref "/commands/xinfo-consumers" >}}) for observability. + +See the [`node-redis` documentation]({{< relref "/develop/clients/nodejs" >}}) for the full client reference, and the [Streams overview]({{< relref "/develop/data-types/streams" >}}) for the deeper conceptual model — consumer groups, the PEL, claim semantics, capped streams, and the differences with Kafka partitions. diff --git a/content/develop/use-cases/streaming/nodejs/consumerWorker.js b/content/develop/use-cases/streaming/nodejs/consumerWorker.js new file mode 100644 index 0000000000..217d542d17 --- /dev/null +++ b/content/develop/use-cases/streaming/nodejs/consumerWorker.js @@ -0,0 +1,321 @@ +"use strict"; + +/** + * Background consumer "thread" for a single consumer in a consumer group. + * + * Node.js is single-threaded for JavaScript execution, so a worker is + * really an async loop on the event loop. Each worker uses its own + * duplicated Redis client because `XREADGROUP BLOCK` parks the + * connection until either an entry arrives or the timeout expires — + * other handlers would otherwise have to wait behind it. + * + * The loop is `XREADGROUP >` → process → `XACK`. Recovery of stuck + * PEL entries (this consumer's, after a restart, or another + * consumer's, after a crash) runs through `reapIdlePel()`, which is + * the textbook Streams pattern: each consumer periodically calls + * `XAUTOCLAIM` with itself as the target, then processes whatever + * it claimed. The demo's "XAUTOCLAIM to selected" button is exactly + * that call. + * + * Two demo-only levers are wired into the loop: + * + * - `pause()` parks the worker (so its pending entries age into the + * XAUTOCLAIM window without being consumed by `>` reads). + * - `crashNext(n)` tells the worker to drop its next `n` deliveries + * on the floor without acking them — the same effect as a worker + * process dying mid-message. Those entries stay in the group's PEL + * until `reapIdlePel` recovers them. + * + * Real consumers do not need either lever; they only need + * `XREADGROUP` → process → `XACK` in `_run` and a periodic + * `reapIdlePel` call to recover stuck entries. + */ + +const { EventStream } = require("./eventStream"); + +function sleep(ms) { + return new Promise((resolve) => setTimeout(resolve, ms)); +} + +class ConsumerWorker { + /** + * @param {object} options + * @param {EventStream} options.stream + * Shared `EventStream` for `XACK`, `XAUTOCLAIM`, stats, etc. + * All workers in the same demo should pass the same instance + * so the produced/acked/claimed counters aggregate. + * @param {EventStream} [options.blockingStream] + * Optional separate `EventStream` whose Redis client is used + * for `XREADGROUP BLOCK`. node-redis serialises commands per + * connection, so a blocking read parks the client until the + * BLOCK timeout elapses or an entry arrives — sharing one + * client across workers (or with the HTTP handlers) would + * serialise all reads through that socket. Defaults to + * `stream` if not supplied. + * @param {string} options.group + * @param {string} options.name + * @param {number} [options.processLatencyMs=25] + * @param {number} [options.recentCapacity=20] + */ + constructor({ + stream, + blockingStream, + group, + name, + processLatencyMs = 25, + recentCapacity = 20, + }) { + if (!stream || !group || !name) { + throw new Error("stream, group and name are required."); + } + this.stream = stream; + this.blockingStream = blockingStream || stream; + this.group = group; + this.name = name; + this.processLatencyMs = processLatencyMs; + this._recentCapacity = recentCapacity; + + /** @type {Array} */ + this._recent = []; + this._processed = 0; + this._reaped = 0; + this._crashedDrops = 0; + + this._paused = false; + this._crashNext = 0; + this._stopped = true; + /** @type {Promise | null} */ + this._runPromise = null; + } + + // ------------------------------------------------------------------ + // Lifecycle + // ------------------------------------------------------------------ + + start() { + if (this._runPromise && !this._stopped) return; + this._stopped = false; + this._runPromise = this._run().catch((err) => { + console.error( + `[${this.group}/${this.name}] worker loop exited with error: ${ + (err && err.message) || err + }`, + ); + }); + } + + async stop(joinTimeoutMs = 2000) { + this._stopped = true; + if (!this._runPromise) return; + const timeout = new Promise((resolve) => + setTimeout(() => resolve("timeout"), joinTimeoutMs), + ); + const result = await Promise.race([ + this._runPromise.then(() => "ok"), + timeout, + ]); + if (result === "ok") { + this._runPromise = null; + } + } + + // ------------------------------------------------------------------ + // Demo levers + // ------------------------------------------------------------------ + + pause() { + this._paused = true; + } + + resume() { + this._paused = false; + } + + /** + * Drop the next `count` deliveries without acking them. + * + * The entries stay in the group's PEL with their delivery counter + * incremented, so `XAUTOCLAIM` can recover them once they exceed + * the idle threshold. + * + * @param {number} count + */ + crashNext(count) { + const n = Math.max(0, Number.parseInt(count, 10) || 0); + this._crashNext += n; + } + + // ------------------------------------------------------------------ + // Introspection + // ------------------------------------------------------------------ + + recent() { + return this._recent.slice(); + } + + status() { + return { + name: this.name, + group: this.group, + processed: this._processed, + reaped: this._reaped, + crashed_drops: this._crashedDrops, + paused: this._paused, + crash_queued: this._crashNext, + alive: this._runPromise !== null && !this._stopped, + }; + } + + // ------------------------------------------------------------------ + // Recovery + // ------------------------------------------------------------------ + + /** + * Run `XAUTOCLAIM` into self and process the claimed entries. + * + * Returns a summary `{claimed, deletedIds, processed}`. Safe to call + * from any handler — the heavy lifting is `stream.autoclaim` (a + * Redis call) and the sequential per-entry dispatch via + * `_handleEntry`. + * + * `deletedIds` are PEL entries whose stream payload was already + * trimmed by `MAXLEN ~` / `XTRIM` before the sweep ran. Redis 7+ + * removes them from the PEL inside `XAUTOCLAIM` itself, so the + * caller does not have to `XACK` them; they are reported so the + * caller can route them to a dead-letter store. + */ + async reapIdlePel() { + const { claimed, deletedIds } = await this.stream.autoclaim( + this.group, + this.name, + { pageCount: 100, maxPages: 10 }, + ); + let processed = 0; + for (const [entryId, fields] of claimed) { + try { + if (this.processLatencyMs) { + await sleep(this.processLatencyMs); + } + await this._handleEntry(entryId, fields); + processed += 1; + } catch (err) { + console.error( + `[${this.group}/${this.name}] reap failed on ${entryId}: ${ + (err && err.message) || err + }`, + ); + } + } + this._reaped += processed; + return { claimed: claimed.length, deletedIds, processed }; + } + + // ------------------------------------------------------------------ + // Main loop + // ------------------------------------------------------------------ + + async _run() { + while (!this._stopped) { + if (this._paused) { + await sleep(50); + continue; + } + + let entries; + try { + // Use the dedicated blocking client so the shared client stays + // free for HTTP-handler commands. + entries = await this.blockingStream.consume( + this.group, + this.name, + 10, + 500, + ); + } catch (err) { + // Don't kill the loop on a transient Redis error; a real + // consumer would log this and back off. + console.error( + `[${this.group}/${this.name}] read failed: ${ + (err && err.message) || err + }`, + ); + await sleep(500); + continue; + } + + for (const [entryId, fields] of entries) { + if (this._stopped) break; + await this._dispatch(entryId, fields); + } + } + this._runPromise = null; + } + + /** + * Wrap per-entry processing so a single failure (typically `XACK` + * against Redis) doesn't kill the loop — that would silently halt + * this consumer while every other entry sat in its PEL waiting for + * XAUTOCLAIM. The entry stays unacked; the next `reapIdlePel` call + * (here or on any consumer in the group) can recover it once it + * exceeds the idle threshold. + */ + async _dispatch(entryId, fields) { + if (this.processLatencyMs) { + await sleep(this.processLatencyMs); + } + try { + await this._handleEntry(entryId, fields); + } catch (err) { + console.error( + `[${this.group}/${this.name}] failed to handle ${entryId}: ${ + (err && err.message) || err + }`, + ); + this._pushRecent({ + id: entryId, + type: (fields && fields.type) || "", + fields, + acked: false, + note: `handler error: ${(err && err.message) || err}`, + }); + } + } + + async _handleEntry(entryId, fields) { + // Capture-and-decrement is safe without a lock because Node.js + // runs JS on a single thread — no other handler can interleave + // between the read and the write. + const drop = this._crashNext > 0; + if (drop) { + this._crashNext -= 1; + this._crashedDrops += 1; + this._pushRecent({ + id: entryId, + type: (fields && fields.type) || "", + fields, + acked: false, + note: "dropped (simulated crash)", + }); + return; + } + + await this.stream.ack(this.group, [entryId]); + this._processed += 1; + this._pushRecent({ + id: entryId, + type: (fields && fields.type) || "", + fields, + acked: true, + note: "", + }); + } + + _pushRecent(entry) { + this._recent.unshift(entry); + if (this._recent.length > this._recentCapacity) { + this._recent.length = this._recentCapacity; + } + } +} + +module.exports = { ConsumerWorker }; diff --git a/content/develop/use-cases/streaming/nodejs/demoServer.js b/content/develop/use-cases/streaming/nodejs/demoServer.js new file mode 100644 index 0000000000..e6ac894aa5 --- /dev/null +++ b/content/develop/use-cases/streaming/nodejs/demoServer.js @@ -0,0 +1,1156 @@ +#!/usr/bin/env node +"use strict"; + +/** + * Redis streaming demo server. + * + * Run this file and visit http://localhost:8083 to watch a Redis Stream + * in action: producers append events to a single stream, two independent + * consumer groups read the same stream at their own pace, and within + * the `notifications` group two consumers share the work. + * + * Use the UI to: + * + * - Produce events into the stream. + * - Watch each consumer group's last-delivered ID, PEL count, and + * the consumers inside it. + * - Drop the next N messages from a chosen consumer to simulate a + * crash mid-processing, then run XAUTOCLAIM to reassign the stuck + * entries to a healthy consumer. + * - Replay any ID range with XRANGE to confirm the history is + * independent of consumer-group state. + * - Trim the stream with XTRIM to bound retention. + */ + +const http = require("http"); +const { URL, URLSearchParams } = require("url"); +const { createClient } = require("redis"); + +const { EventStream } = require("./eventStream"); +const { ConsumerWorker } = require("./consumerWorker"); + +const EVENT_TYPES = [ + "order.placed", + "order.paid", + "order.shipped", + "order.cancelled", +]; + +const DEFAULT_GROUPS = { + notifications: ["worker-a", "worker-b"], + analytics: ["worker-c"], +}; + +const HTML_TEMPLATE = ` + + + + + Redis Streaming Demo + + + +
+
node-redis + Node.js standard http module
+

Redis Streaming Demo

+

+ Producers append events to a single Redis Stream + (__STREAM_KEY__). Two consumer groups read the same + stream independently: notifications shares its work + across two consumers, analytics processes the full + flow on its own. Acknowledge with XACK, recover + crashed deliveries with XAUTOCLAIM, replay any range + with XRANGE, and bound retention with XTRIM. +

+ +
+
+

Stream state

+
Loading...
+ + +
+ +
+

Produce events

+

Events are appended with XADD with an approximate + MAXLEN ~ __MAXLEN__ retention cap.

+ + + + + +
+ +
+

Replay range (XRANGE)

+

Reads a slice of history. Replay is independent of any + consumer group — no cursors move, no acks happen.

+ + + + + + + +
+ +
+

Trim retention (XTRIM)

+

Cap the stream length. Approximate trimming releases whole + macro-nodes, which is much cheaper than exact trimming.

+ + + +
+ +
+

Consumer groups

+
Loading...
+
+ +
+

Pending entries (XPENDING)

+

Entries delivered to a consumer that haven't been acked yet. + Idle time ≥ __CLAIM_IDLE__ ms is eligible for + XAUTOCLAIM.

+
Loading...
+
+ + +
+
+ +
+

Last result

+

Produce events, replay a range, or trigger an autoclaim to see results.

+
+
+ +
+
+ + + + +`; + +/** + * In-memory registry of consumer workers across all groups. + * + * The Node.js event loop dispatches HTTP handlers one at a time, but + * each handler may `await` Redis calls, so add/remove operations can + * interleave around `await` points. The registry's mutations are + * intentionally short and bracketed by a single Redis round trip, so + * we don't add an explicit lock — but we always snapshot the + * registry into a local array before iterating asynchronously. + */ +class StreamingDemo { + /** + * @param {object} options + * @param {EventStream} options.stream + * @param {() => Promise} options.makeClient + * Factory that returns a fresh, connected Redis client. Each + * consumer worker uses its own client because `XREADGROUP BLOCK` + * parks the connection. + * @param {number} options.maxlenApprox + * @param {number} options.claimMinIdleMs + * @param {string} options.streamKey + */ + constructor({ + stream, + makeClient, + maxlenApprox, + claimMinIdleMs, + streamKey, + }) { + this.stream = stream; + this.makeClient = makeClient; + this.maxlenApprox = maxlenApprox; + this.claimMinIdleMs = claimMinIdleMs; + this.streamKey = streamKey; + /** @type {Map} */ + this.workers = new Map(); + /** @type {Set} names currently being constructed by addWorker. */ + this._inFlight = new Set(); + } + + _key(group, name) { + return `${group}|${name}`; + } + + async seed(groups) { + for (const [group, names] of Object.entries(groups)) { + await this.stream.ensureGroup(group, "0-0"); + for (const name of names) { + await this.addWorker(group, name); + } + } + return Object.values(groups).reduce((sum, list) => sum + list.length, 0); + } + + /** + * @returns {Promise} + */ + async addWorker(group, name) { + const key = this._key(group, name); + // Reserve the name atomically before the first `await`. Without + // this, two concurrent `addWorker(group, name)` calls would both + // pass the duplicate check before either inserted into + // `this.workers`, both proceed through ensureGroup + makeClient + + // worker.start, and both call `this.workers.set(...)` — leaving + // two live ConsumerWorkers for the same name with one of them + // leaking (its client never gets closed because the Map only + // holds the second entry). `_inFlight` is checked alongside + // `this.workers` so the duplicate check is atomic against + // concurrent callers on the single-threaded event loop. + if (this.workers.has(key) || this._inFlight.has(key)) return false; + this._inFlight.add(key); + let client; + try { + await this.stream.ensureGroup(group, "0-0"); + + // Each worker owns its own Redis client because XREADGROUP BLOCK + // parks the connection. Sharing the main client across workers + // would serialise their reads through one socket. The blocking + // client only carries the `xReadGroup` call; XACK, XAUTOCLAIM, + // and stats updates go through the shared `this.stream` so the + // counters aggregate. + client = await this.makeClient(); + const blockingStream = new EventStream({ + redisClient: client, + streamKey: this.streamKey, + maxlenApprox: this.maxlenApprox, + claimMinIdleMs: this.claimMinIdleMs, + }); + const worker = new ConsumerWorker({ + stream: this.stream, + blockingStream, + group, + name, + }); + worker.start(); + this.workers.set(key, { worker, client }); + return true; + } catch (err) { + // Best-effort cleanup of any partially-opened client. + if (client) { + try { await client.quit(); } catch { /* ignore */ } + } + throw err; + } finally { + this._inFlight.delete(key); + } + } + + /** + * Remove a consumer safely. + * + * `XGROUP DELCONSUMER` destroys the consumer's PEL entries outright, + * so any pending message it still owned would become unreachable. + * Before deleting, hand its PEL off to another consumer in the + * same group with `XCLAIM`. Without a peer consumer to take over, + * refuse to delete and leave the worker in place so the user can + * add a peer first. + * + * @returns {Promise<{removed: boolean, reason?: string, message?: string, handed_over_to?: string, handed_over_count?: number}>} + */ + async removeWorker(group, name) { + const key = this._key(group, name); + const entry = this.workers.get(key); + if (!entry) { + return { removed: false, reason: "not-found" }; + } + + const peers = []; + for (const [k, v] of this.workers) { + if (k === key) continue; + if (v.worker.group === group) peers.push(v.worker.name); + } + if (peers.length === 0) { + return { + removed: false, + reason: "no-peer", + message: + `${group}/${name} still owns pending entries and is the only ` + + "consumer in its group; add another consumer first so its " + + "PEL can be handed over before deletion.", + }; + } + + const handoverTarget = peers[0]; + + // Run the handover BEFORE removing the worker from the registry. + // XGROUP DELCONSUMER destroys the source's pending list, so any + // handover failure must abort the removal — leaving the worker in + // place lets the user retry once the underlying Redis issue is + // resolved. The worker keeps consuming during the handover; + // XCLAIM with MIN-IDLE-TIME 0 races acks gracefully (anything the + // worker acks during the window is gone from XPENDING and isn't + // moved). + let claimed; + try { + claimed = await this.stream.handoverPending(group, name, handoverTarget); + } catch (err) { + return { + removed: false, + reason: "handover-failed", + message: + `Handover from ${group}/${name} to ${handoverTarget} failed ` + + `before XGROUP DELCONSUMER could run: ${err.message}. ` + + `${group}/${name} is still in the group; retry the remove or ` + + "investigate the Redis error before deleting (DELCONSUMER would " + + "destroy the source consumer's pending entries).", + }; + } + + // Handover succeeded; now safe to remove from the registry, stop + // the worker, close its blocking client, and destroy the consumer + // record in Redis. + this.workers.delete(key); + await entry.worker.stop(); + try { + await entry.client.quit(); + } catch { + // ignore close errors + } + await this.stream.deleteConsumer(group, name); + return { + removed: true, + handed_over_to: handoverTarget, + handed_over_count: claimed, + }; + } + + /** + * @param {string} group + * @param {string} name + */ + getWorker(group, name) { + const entry = this.workers.get(this._key(group, name)); + return entry ? entry.worker : null; + } + + /** + * Stable list of `{group, name, worker}` for iteration outside of + * mutation paths. + */ + workersSnapshot() { + const list = []; + for (const [, v] of this.workers) { + list.push({ group: v.worker.group, name: v.worker.name, worker: v.worker }); + } + return list; + } + + async stopAll() { + const entries = Array.from(this.workers.values()); + this.workers.clear(); + await Promise.all( + entries.map(async (entry) => { + await entry.worker.stop(); + try { + await entry.client.quit(); + } catch { + // ignore + } + }), + ); + } + + async reset() { + await this.stopAll(); + await this.stream.deleteStream(); + this.stream.resetStats(); + return this.seed(DEFAULT_GROUPS); + } +} + +function parseArgs() { + const args = process.argv.slice(2); + const config = { + host: "127.0.0.1", + port: 8083, + redisHost: "localhost", + redisPort: 6379, + streamKey: "demo:events:orders", + maxlen: 2000, + claimIdleMs: 5000, + resetOnStart: true, + }; + + for (let i = 0; i < args.length; i += 1) { + switch (args[i]) { + case "--host": + config.host = args[++i]; + break; + case "--port": + config.port = Number.parseInt(args[++i], 10); + break; + case "--redis-host": + config.redisHost = args[++i]; + break; + case "--redis-port": + config.redisPort = Number.parseInt(args[++i], 10); + break; + case "--stream-key": + config.streamKey = args[++i]; + break; + case "--maxlen": + config.maxlen = Number.parseInt(args[++i], 10); + break; + case "--claim-idle-ms": + config.claimIdleMs = Number.parseInt(args[++i], 10); + break; + case "--no-reset": + config.resetOnStart = false; + break; + default: + break; + } + } + return config; +} + +function readBody(req) { + return new Promise((resolve, reject) => { + let body = ""; + req.on("data", (chunk) => { + body += chunk; + }); + req.on("end", () => resolve(body)); + req.on("error", reject); + }); +} + +function sendJson(res, status, payload) { + const body = JSON.stringify(payload); + res.writeHead(status, { "Content-Type": "application/json" }); + res.end(body); +} + +function fakePayload() { + const customers = ["alice", "bob", "carol", "dan", "erin"]; + const orderId = `o-${Math.floor(1000 + Math.random() * 9000)}`; + const customer = customers[Math.floor(Math.random() * customers.length)]; + const amount = (5 + Math.random() * 245).toFixed(2); + return { order_id: orderId, customer, amount }; +} + +function htmlPage(stream) { + return HTML_TEMPLATE.replace("__STREAM_KEY__", stream.streamKey) + .replace("__MAXLEN__", String(stream.maxlenApprox)) + .replace("__CLAIM_IDLE__", String(stream.claimMinIdleMs)); +} + +// -------------------------------------------------------------------- +// Request handlers +// -------------------------------------------------------------------- + +async function handleProduce(form, stream) { + let count = Number.parseInt(form.get("count") || "1", 10); + if (!Number.isFinite(count) || count < 1) count = 1; + if (count > 500) count = 500; + const eventType = (form.get("type") || "").trim(); + const events = []; + for (let i = 0; i < count; i += 1) { + const picked = + eventType || EVENT_TYPES[Math.floor(Math.random() * EVENT_TYPES.length)]; + events.push([picked, fakePayload()]); + } + const ids = await stream.produceBatch(events); + return { status: 200, body: { produced: ids.length, ids } }; +} + +async function handleAddWorker(form, demo) { + const group = (form.get("group") || "").trim(); + const name = (form.get("name") || "").trim(); + if (!group || !name) { + return { status: 400, body: { error: "group and name are required" } }; + } + const added = await demo.addWorker(group, name); + if (!added) { + return { status: 409, body: { error: `${group}/${name} already exists` } }; + } + return { status: 200, body: { group, name } }; +} + +async function handleRemoveWorker(form, demo) { + const group = (form.get("group") || "").trim(); + const name = (form.get("name") || "").trim(); + const result = await demo.removeWorker(group, name); + const status = + result.removed || result.reason === "not-found" ? 200 : 409; + return { status, body: result }; +} + +async function handleCrash(form, demo) { + const group = (form.get("group") || "").trim(); + const name = (form.get("name") || "").trim(); + const count = Number.parseInt(form.get("count") || "1", 10) || 1; + const worker = demo.getWorker(group, name); + if (!worker) { + return { + status: 404, + body: { error: `unknown consumer ${group}/${name}` }, + }; + } + worker.crashNext(count); + return { status: 200, body: { queued: count } }; +} + +async function handleAutoclaim(form, demo, stream) { + const group = (form.get("group") || "").trim(); + const consumer = (form.get("consumer") || "").trim(); + if (!group || !consumer) { + return { + status: 400, + body: { error: "group and consumer are required" }, + }; + } + const worker = demo.getWorker(group, consumer); + if (!worker) { + return { + status: 404, + body: { error: `unknown consumer ${group}/${consumer}` }, + }; + } + // `reapIdlePel` runs XAUTOCLAIM(self) + process + ack. `deletedIds` + // are PEL entries whose stream payload was already trimmed by + // `MAXLEN ~` before the sweep ran. Redis 7+ removes them from the + // PEL inside XAUTOCLAIM itself, so the caller doesn't have to XACK + // them; in production they would be routed to a dead-letter store + // for offline inspection. + const result = await worker.reapIdlePel(); + return { + status: 200, + body: { + claimed: result.claimed, + processed: result.processed, + deleted: result.deletedIds, + min_idle_ms: stream.claimMinIdleMs, + }, + }; +} + +async function handleTrim(form, stream) { + const maxlen = Number.parseInt(form.get("maxlen") || "0", 10) || 0; + const deleted = await stream.trimMaxlen(maxlen); + return { status: 200, body: { deleted, maxlen } }; +} + +async function handleReplay(url, stream) { + const start = url.searchParams.get("start") || "-"; + const end = url.searchParams.get("end") || "+"; + let limit = Number.parseInt(url.searchParams.get("count") || "20", 10); + if (!Number.isFinite(limit) || limit < 1) limit = 20; + if (limit > 500) limit = 500; + const entries = await stream.replay(start, end, limit); + return { + status: 200, + body: { + start, + end, + limit, + entries: entries.map(([id, fields]) => ({ id, fields })), + }, + }; +} + +async function buildState(stream, demo) { + const [streamInfo, groups, tailEntries] = await Promise.all([ + stream.infoStream(), + stream.infoGroups(), + // XREVRANGE returns the *newest* N entries (in reverse order) — the + // tail view wants the most recent activity, not the head of history. + stream.redis.xRevRange(stream.streamKey, "+", "-", { COUNT: 10 }), + ]); + + const groupsDetail = []; + const pendingRows = []; + const workersSnapshot = demo.workersSnapshot(); + + for (const group of groups) { + const name = group.name; + const consumerInfoRaw = await stream.infoConsumers(name); + const consumerInfo = new Map(consumerInfoRaw.map((c) => [c.name, c])); + const consumersDetail = []; + for (const entry of workersSnapshot) { + if (entry.group !== name) continue; + const info = consumerInfo.get(entry.name) || {}; + const status = entry.worker.status(); + consumersDetail.push({ + ...status, + pending: info.pending || 0, + idle_ms: info.idle_ms || 0, + recent: entry.worker.recent(), + }); + } + // Also include consumers that exist in Redis but not in our + // in-process registry (e.g. orphaned after a restart). + for (const [cName, info] of consumerInfo) { + if (!consumersDetail.some((c) => c.name === cName)) { + consumersDetail.push({ + name: cName, + group: name, + processed: 0, + reaped: 0, + crashed_drops: 0, + paused: false, + crash_queued: 0, + alive: false, + pending: info.pending || 0, + idle_ms: info.idle_ms || 0, + recent: [], + }); + } + } + consumersDetail.sort((a, b) => a.name.localeCompare(b.name)); + groupsDetail.push({ ...group, consumers_detail: consumersDetail }); + + const pending = await stream.pendingDetail(name, 50); + for (const row of pending) { + pendingRows.push({ + group: name, + consumer: row.consumer, + id: row.id, + idle_ms: row.idleMs, + deliveries: row.deliveries, + }); + } + } + + const tail = []; + for (const entry of tailEntries || []) { + if (entry && entry.id) tail.push({ id: entry.id, fields: entry.message || {} }); + } + + return { + stream: streamInfo, + tail, + groups: groupsDetail, + pending: pendingRows, + stats: stream.stats(), + }; +} + +async function main() { + const config = parseArgs(); + + const makeClient = async () => { + const client = createClient({ + socket: { host: config.redisHost, port: config.redisPort }, + }); + client.on("error", (err) => console.error("Redis error:", err.message || err)); + await client.connect(); + return client; + }; + + const mainClient = await makeClient(); + const stream = new EventStream({ + redisClient: mainClient, + streamKey: config.streamKey, + maxlenApprox: config.maxlen, + claimMinIdleMs: config.claimIdleMs, + }); + const demo = new StreamingDemo({ + stream, + makeClient, + maxlenApprox: config.maxlen, + claimMinIdleMs: config.claimIdleMs, + streamKey: config.streamKey, + }); + + if (config.resetOnStart) { + console.log( + `Deleting any existing data at key '${config.streamKey}'` + + " for a clean demo run (pass --no-reset to keep it).", + ); + await stream.deleteStream(); + } + const seeded = await demo.seed(DEFAULT_GROUPS); + + console.log( + `Redis streaming demo server listening on http://${config.host}:${config.port}`, + ); + console.log( + `Using Redis at ${config.redisHost}:${config.redisPort}` + + ` with stream key '${config.streamKey}' (MAXLEN ~ ${config.maxlen})`, + ); + console.log( + `Seeded ${seeded} consumer(s) across ${Object.keys(DEFAULT_GROUPS).length} group(s)`, + ); + + const server = http.createServer(async (req, res) => { + const url = new URL(req.url, `http://${req.headers.host || "localhost"}`); + + try { + if ( + req.method === "GET" && + (url.pathname === "/" || url.pathname === "/index.html") + ) { + res.writeHead(200, { "Content-Type": "text/html; charset=utf-8" }); + res.end(htmlPage(stream)); + return; + } + if (req.method === "GET" && url.pathname === "/state") { + const state = await buildState(stream, demo); + sendJson(res, 200, state); + return; + } + if (req.method === "GET" && url.pathname === "/replay") { + const r = await handleReplay(url, stream); + sendJson(res, r.status, r.body); + return; + } + + if (req.method === "POST" && url.pathname === "/produce") { + const form = new URLSearchParams(await readBody(req)); + const r = await handleProduce(form, stream); + sendJson(res, r.status, r.body); + return; + } + if (req.method === "POST" && url.pathname === "/add-worker") { + const form = new URLSearchParams(await readBody(req)); + const r = await handleAddWorker(form, demo); + sendJson(res, r.status, r.body); + return; + } + if (req.method === "POST" && url.pathname === "/remove-worker") { + const form = new URLSearchParams(await readBody(req)); + const r = await handleRemoveWorker(form, demo); + sendJson(res, r.status, r.body); + return; + } + if (req.method === "POST" && url.pathname === "/crash") { + const form = new URLSearchParams(await readBody(req)); + const r = await handleCrash(form, demo); + sendJson(res, r.status, r.body); + return; + } + if (req.method === "POST" && url.pathname === "/autoclaim") { + const form = new URLSearchParams(await readBody(req)); + const r = await handleAutoclaim(form, demo, stream); + sendJson(res, r.status, r.body); + return; + } + if (req.method === "POST" && url.pathname === "/trim") { + const form = new URLSearchParams(await readBody(req)); + const r = await handleTrim(form, stream); + sendJson(res, r.status, r.body); + return; + } + if (req.method === "POST" && url.pathname === "/reset") { + const count = await demo.reset(); + sendJson(res, 200, { consumers: count }); + return; + } + + res.writeHead(404, { "Content-Type": "text/plain" }); + res.end("Not Found"); + } catch (err) { + console.error("Request error:", err); + sendJson(res, 500, { error: (err && err.message) || "Internal error" }); + } + }); + + const shutdown = async () => { + console.log("\nShutting down..."); + await demo.stopAll(); + server.close(); + try { + await mainClient.quit(); + } catch { + // ignore + } + process.exit(0); + }; + process.on("SIGINT", shutdown); + process.on("SIGTERM", shutdown); + + server.listen(config.port, config.host); +} + +main().catch((err) => { + console.error(err); + process.exit(1); +}); diff --git a/content/develop/use-cases/streaming/nodejs/eventStream.js b/content/develop/use-cases/streaming/nodejs/eventStream.js new file mode 100644 index 0000000000..8b95a18de5 --- /dev/null +++ b/content/develop/use-cases/streaming/nodejs/eventStream.js @@ -0,0 +1,566 @@ +"use strict"; + +/** + * Redis event-stream helper backed by a single Redis Stream. + * + * Producers append events with `XADD`. Consumers belong to consumer + * groups and read with `XREADGROUP`. The group as a whole tracks a + * single `last-delivered-id` cursor, and each consumer gets its own + * pending-entries list (PEL) of in-flight messages it has been handed. + * Once a consumer has processed an entry it acknowledges it with + * `XACK`; entries left unacknowledged past an idle threshold can be + * swept to a healthy consumer with `XAUTOCLAIM` (or to a specific one + * with `XCLAIM`). + * + * Each `XADD` carries an approximate `MAXLEN` so the stream stays + * bounded as it rolls forward. `XRANGE` supports replay over the + * retained history for debugging, audit, or rebuilding a downstream + * projection. Note that approximate trimming can release entries that + * are still in a group's PEL: those entries appear in `XAUTOCLAIM`'s + * deletedMessages list, which the caller should log and route to a + * dead-letter store. Redis 7+ removes them from the PEL inside the + * `XAUTOCLAIM` call itself, so no explicit `XACK` is needed. + * + * The same stream can be read by any number of consumer groups — each + * group has its own cursor and its own pending lists, so analytics, + * notifications, and audit can all process the full event flow at + * their own pace without coordinating with each other. + */ + +/** + * @typedef {[string, Record]} Entry + */ + +/** + * Flatten the `XREADGROUP` reply (an array of `{name, messages}`) into + * a flat list of `[id, fields]` tuples. node-redis 5.x returns each + * message as `{id, message}` where `message` is already a plain + * object map of field/value strings. + * + * @param {Array<{name: string, messages: Array<{id: string, message: Record}>}> | null} raw + * @returns {Entry[]} + */ +function flattenEntries(raw) { + if (!raw) return []; + const out = []; + for (const stream of raw) { + for (const entry of stream.messages || []) { + out.push([entry.id, entry.message || {}]); + } + } + return out; +} + +/** + * Coerce node-redis stream-message objects (`{id, message}`) into a + * `[id, fields]` tuple. Used by every read path that returns an array + * of stream messages (XAUTOCLAIM, XCLAIM, XRANGE). + * + * @param {{id: string, message: Record} | null} entry + * @returns {Entry | null} + */ +function toTuple(entry) { + if (!entry || !entry.id) return null; + return [entry.id, entry.message || {}]; +} + +class EventStream { + /** + * @param {object} options + * @param {import("redis").RedisClientType} options.redisClient + * @param {string} [options.streamKey="demo:events:orders"] + * @param {number} [options.maxlenApprox=10000] + * @param {number} [options.claimMinIdleMs=15000] + */ + constructor({ + redisClient, + streamKey = "demo:events:orders", + maxlenApprox = 10_000, + claimMinIdleMs = 15_000, + } = {}) { + if (!redisClient) { + throw new Error("A connected redisClient is required."); + } + this.redis = redisClient; + this.streamKey = streamKey; + this.maxlenApprox = maxlenApprox; + this.claimMinIdleMs = claimMinIdleMs; + + // Node.js is single-threaded for JS execution, so plain numbers + // are safe for counters. No lock needed. + this._producedTotal = 0; + this._ackedTotal = 0; + this._claimedTotal = 0; + } + + // -------------------------------------------------------------------- + // Producer + // -------------------------------------------------------------------- + + /** + * Append a single event. Returns the stream ID Redis assigned. + * @param {string} eventType + * @param {Record} payload + * @returns {Promise} + */ + async produce(eventType, payload) { + const ids = await this.produceBatch([[eventType, payload]]); + return ids[0]; + } + + /** + * Pipeline several `XADD` calls in one round trip. + * + * Each entry carries an approximate `MAXLEN` cap. The `~` flavour + * lets Redis trim at a macro-node boundary, which is much cheaper + * than exact trimming and is the right call for a retention + * guardrail rather than a hard size limit. + * + * @param {Array<[string, Record]>} events + * @returns {Promise} + */ + async produceBatch(events) { + const list = Array.from(events); + if (list.length === 0) return []; + const pipe = this.redis.multi(); + for (const [eventType, payload] of list) { + const fields = EventStream._encodeFields(eventType, payload); + pipe.xAdd(this.streamKey, "*", fields, { + TRIM: { + strategy: "MAXLEN", + strategyModifier: "~", + threshold: this.maxlenApprox, + }, + }); + } + // execAsPipeline sends the commands in one round trip without + // wrapping them in MULTI/EXEC. + const ids = await pipe.execAsPipeline(); + this._producedTotal += ids.length; + return ids.map((id) => String(id)); + } + + /** + * Build the field/value map for an XADD. Every value is coerced + * to a string so node-redis doesn't reject mixed-type input. + * + * @param {string} eventType + * @param {Record} payload + * @returns {Record} + */ + static _encodeFields(eventType, payload) { + const fields = { + type: String(eventType), + ts_ms: String(Date.now()), + }; + if (payload) { + for (const [key, value] of Object.entries(payload)) { + fields[key] = value === null || value === undefined ? "" : String(value); + } + } + return fields; + } + + // -------------------------------------------------------------------- + // Consumer groups + // -------------------------------------------------------------------- + + /** + * Create the consumer group if it doesn't exist. + * + * `$` means "deliver only events appended after this point"; pass + * `0-0` to replay the entire stream into a fresh group. + * + * @param {string} group + * @param {string} [startId="$"] + */ + async ensureGroup(group, startId = "$") { + try { + await this.redis.xGroupCreate(this.streamKey, group, startId, { + MKSTREAM: true, + }); + } catch (err) { + const msg = (err && err.message) || ""; + if (!msg.includes("BUSYGROUP")) { + throw err; + } + } + } + + /** + * @param {string} group + * @returns {Promise} + */ + async deleteGroup(group) { + try { + return Number(await this.redis.xGroupDestroy(this.streamKey, group)); + } catch { + return 0; + } + } + + /** + * Read new entries for this consumer via `XREADGROUP`. + * + * The `>` ID means "deliver entries this consumer group has not + * delivered to anyone yet" — that is the at-least-once path. + * Replaying an explicit ID instead would re-deliver an entry that + * is already in this consumer's pending list (see `consumeOwnPel` + * for that recovery path). + * + * @param {string} group + * @param {string} consumer + * @param {number} [count=10] + * @param {number} [blockMs=500] + * @returns {Promise} + */ + async consume(group, consumer, count = 10, blockMs = 500) { + const raw = await this.redis.xReadGroup( + group, + consumer, + [{ key: this.streamKey, id: ">" }], + { COUNT: count, BLOCK: blockMs }, + ); + return flattenEntries(raw); + } + + /** + * Re-deliver entries already in this consumer's PEL. + * + * Reading with an explicit ID (`0` here) instead of `>` replays the + * entries already assigned to this consumer name without advancing + * the group's `last-delivered-id`. This is the canonical recovery + * path after a restart on the same consumer name. Do NOT call this + * from the main read loop — every call resets the per-entry idle + * counter, which would keep crashed entries below the XAUTOCLAIM + * threshold forever. + * + * @param {string} group + * @param {string} consumer + * @param {number} [count=10] + * @returns {Promise} + */ + async consumeOwnPel(group, consumer, count = 10) { + const raw = await this.redis.xReadGroup( + group, + consumer, + [{ key: this.streamKey, id: "0" }], + { COUNT: count }, + ); + return flattenEntries(raw); + } + + /** + * @param {string} group + * @param {Iterable} ids + * @returns {Promise} + */ + async ack(group, ids) { + const idList = Array.from(ids); + if (idList.length === 0) return 0; + const n = Number(await this.redis.xAck(this.streamKey, group, idList)); + this._ackedTotal += n; + return n; + } + + /** + * Sweep idle pending entries to `consumer`. + * + * A single `XAUTOCLAIM` call scans up to `pageCount` PEL entries + * starting at `startId` and returns a continuation cursor. For a + * full sweep of the PEL, loop until the cursor returns to `0-0` (or + * hit `maxPages` as a safety net so a very large PEL can't + * monopolise the call). + * + * Returns `{claimed, deletedIds}`. `deletedIds` are PEL entries + * whose stream payload had already been trimmed by the time this + * sweep ran (typically because `MAXLEN ~` retention outran a slow + * consumer). `XAUTOCLAIM` removes those dangling slots from the PEL + * itself — the caller does **not** need to `XACK` them — but they + * cannot be retried, so log and route them to a dead-letter store + * for observability. + * + * @param {string} group + * @param {string} consumer + * @param {object} [options] + * @param {number} [options.pageCount=100] + * @param {string} [options.startId="0-0"] + * @param {number} [options.maxPages=10] + * @returns {Promise<{claimed: Entry[], deletedIds: string[]}>} + */ + async autoclaim(group, consumer, options = {}) { + const { pageCount = 100, startId = "0-0", maxPages = 10 } = options; + const claimedAll = []; + const deletedAll = []; + let cursor = startId; + for (let i = 0; i < maxPages; i += 1) { + const reply = await this.redis.xAutoClaim( + this.streamKey, + group, + consumer, + this.claimMinIdleMs, + cursor, + { COUNT: pageCount }, + ); + for (const entry of reply.messages || []) { + const tuple = toTuple(entry); + if (tuple) claimedAll.push(tuple); + } + for (const id of reply.deletedMessages || []) { + deletedAll.push(String(id)); + } + const nextId = String(reply.nextId || "0-0"); + if (nextId === "0-0") break; + cursor = nextId; + } + this._claimedTotal += claimedAll.length; + return { claimed: claimedAll, deletedIds: deletedAll }; + } + + /** + * Drop a consumer from a group. + * + * `XGROUP DELCONSUMER` destroys this consumer's PEL entries — any + * entry it still owned is no longer tracked anywhere in the group, + * and `XAUTOCLAIM` will never find it again. Always `handoverPending` + * (or `XCLAIM` manually) to a healthy consumer first; this method is + * the raw destructive call and is exposed only for explicit cleanup. + * + * @param {string} group + * @param {string} consumer + * @returns {Promise} + */ + async deleteConsumer(group, consumer) { + try { + return Number( + await this.redis.xGroupDelConsumer(this.streamKey, group, consumer), + ); + } catch { + return 0; + } + } + + /** + * Move every PEL entry owned by `fromConsumer` to `toConsumer`. + * + * Enumerates the source consumer's PEL with `XPENDING ... CONSUMER` + * and reassigns each ID with `XCLAIM` at zero idle time so the move + * is unconditional. (`XAUTOCLAIM` does not filter by source consumer, + * so it cannot be used for a per-consumer handover.) + * + * Call this before `deleteConsumer` whenever the source still has + * pending entries — otherwise `XGROUP DELCONSUMER` would silently + * destroy them and they could never be recovered. + * + * @param {string} group + * @param {string} fromConsumer + * @param {string} toConsumer + * @param {number} [batch=100] + * @returns {Promise} + */ + async handoverPending(group, fromConsumer, toConsumer, batch = 100) { + let totalClaimed = 0; + while (true) { + // Errors from XPENDING / XCLAIM propagate up. Swallowing them + // and returning a partial count would let the caller think the + // handover succeeded; the caller's next step is XGROUP + // DELCONSUMER, which would destroy whatever entries were left + // in the source's PEL. + const rows = await this.redis.xPendingRange( + this.streamKey, + group, + "-", + "+", + batch, + { consumer: fromConsumer }, + ); + if (!rows || rows.length === 0) break; + const ids = rows.map((row) => String(row.id)); + const claimed = await this.redis.xClaim( + this.streamKey, + group, + toConsumer, + 0, + ids, + ); + totalClaimed += Array.isArray(claimed) ? claimed.length : 0; + if (rows.length < batch) break; + } + this._claimedTotal += totalClaimed; + return totalClaimed; + } + + // -------------------------------------------------------------------- + // Replay, length, trim + // -------------------------------------------------------------------- + + /** + * Range read with `XRANGE` for replay or audit. + * + * Read-only: ranges do not update any group cursor and do not ack + * anything. Useful for bootstrapping a new projection, for building + * an audit view, or for debugging what actually went through the + * stream. + * + * @param {string} [startId="-"] + * @param {string} [endId="+"] + * @param {number} [count=100] + * @returns {Promise} + */ + async replay(startId = "-", endId = "+", count = 100) { + const raw = await this.redis.xRange(this.streamKey, startId, endId, { + COUNT: count, + }); + const out = []; + for (const entry of raw || []) { + const tuple = toTuple(entry); + if (tuple) out.push(tuple); + } + return out; + } + + /** @returns {Promise} */ + async length() { + return Number(await this.redis.xLen(this.streamKey)); + } + + /** + * @param {number} maxlen + * @returns {Promise} + */ + async trimMaxlen(maxlen) { + return Number( + await this.redis.xTrim(this.streamKey, "MAXLEN", maxlen, { + strategyModifier: "~", + }), + ); + } + + /** + * @param {string} minid + * @returns {Promise} + */ + async trimMinid(minid) { + return Number( + await this.redis.xTrim(this.streamKey, "MINID", minid, { + strategyModifier: "~", + }), + ); + } + + // -------------------------------------------------------------------- + // Inspection + // -------------------------------------------------------------------- + + /** Subset of XINFO STREAM that's safe to JSON-encode. */ + async infoStream() { + let raw; + try { + raw = await this.redis.xInfoStream(this.streamKey); + } catch { + return { + length: 0, + last_generated_id: null, + first_entry_id: null, + last_entry_id: null, + }; + } + const first = raw["first-entry"]; + const last = raw["last-entry"]; + return { + length: Number(raw.length || 0), + last_generated_id: raw["last-generated-id"] + ? String(raw["last-generated-id"]) + : null, + first_entry_id: first && first.id ? String(first.id) : null, + last_entry_id: last && last.id ? String(last.id) : null, + }; + } + + async infoGroups() { + let rows; + try { + rows = await this.redis.xInfoGroups(this.streamKey); + } catch { + return []; + } + return (rows || []).map((row) => ({ + name: String(row.name), + consumers: Number(row.consumers || 0), + pending: Number(row.pending || 0), + last_delivered_id: row["last-delivered-id"] + ? String(row["last-delivered-id"]) + : null, + lag: row.lag === null || row.lag === undefined ? null : Number(row.lag), + })); + } + + /** + * @param {string} group + */ + async infoConsumers(group) { + let rows; + try { + rows = await this.redis.xInfoConsumers(this.streamKey, group); + } catch { + return []; + } + return (rows || []).map((row) => ({ + name: String(row.name), + pending: Number(row.pending || 0), + idle_ms: Number(row.idle || 0), + })); + } + + /** + * Per-entry PEL view (id, consumer, idleMs, deliveries). + * + * @param {string} group + * @param {number} [count=20] + */ + async pendingDetail(group, count = 20) { + let rows; + try { + rows = await this.redis.xPendingRange( + this.streamKey, + group, + "-", + "+", + count, + ); + } catch { + return []; + } + return (rows || []).map((row) => ({ + id: String(row.id), + consumer: String(row.consumer), + idleMs: Number(row.millisecondsSinceLastDelivery || 0), + deliveries: Number(row.deliveriesCounter || 0), + })); + } + + stats() { + return { + produced_total: this._producedTotal, + acked_total: this._ackedTotal, + claimed_total: this._claimedTotal, + }; + } + + resetStats() { + this._producedTotal = 0; + this._ackedTotal = 0; + this._claimedTotal = 0; + } + + // -------------------------------------------------------------------- + // Demo housekeeping + // -------------------------------------------------------------------- + + /** Drop the stream key entirely. Used by the demo's reset path. */ + async deleteStream() { + await this.redis.del(this.streamKey); + } +} + +module.exports = { EventStream }; diff --git a/content/develop/use-cases/streaming/nodejs/package.json b/content/develop/use-cases/streaming/nodejs/package.json new file mode 100644 index 0000000000..e56db38883 --- /dev/null +++ b/content/develop/use-cases/streaming/nodejs/package.json @@ -0,0 +1,16 @@ +{ + "name": "redis-streaming-nodejs-demo", + "version": "1.0.0", + "private": true, + "description": "Redis streaming demo with node-redis and the Node.js standard http module.", + "main": "demoServer.js", + "scripts": { + "start": "node demoServer.js" + }, + "dependencies": { + "redis": "^5.0.0" + }, + "engines": { + "node": ">=18" + } +} diff --git a/content/develop/use-cases/streaming/php/ConsumerWorker.php b/content/develop/use-cases/streaming/php/ConsumerWorker.php new file mode 100644 index 0000000000..b2509cc594 --- /dev/null +++ b/content/develop/use-cases/streaming/php/ConsumerWorker.php @@ -0,0 +1,483 @@ +` (short block), process each entry, + * `XACK`. Recovery of stuck PEL entries (this consumer's, or anyone + * else's) happens through `reapIdlePel()`, which is the textbook + * Streams pattern: each consumer periodically calls `XAUTOCLAIM` + * with itself as the target, then processes whatever it claimed. + * + * Two demo-only levers, both Redis-backed so the demo server can + * flip them across the process boundary: + * + * * `demo:streaming:worker:{group}:{name}:paused` — non-zero + * parks the worker. The worker writes + * `demo:streaming:worker:{group}:{name}:idle = 1` while it's + * parked so the demo server can wait for a clean stop. + * * `demo:streaming:worker:{group}:{name}:crash_next` — an integer + * counter the worker decrements per delivery; while > 0, the + * worker drops the entry on the floor without acking it. This + * simulates a crash mid-message — the entries stay in the PEL + * and become eligible for `XAUTOCLAIM` once they age past the + * idle threshold. + * + * Per-worker observables (`processed`, `reaped`, `crashed_drops`, + * `recent`) live in Redis under + * `demo:streaming:worker:{group}:{name}:*` so the demo server can + * read them. + * + * This file is dual-purpose: it defines the `ConsumerWorker` class + * (used by the demo server's autoclaim handler to run a reap on a + * named consumer), and when invoked from the CLI it runs the worker + * loop in a long-lived background process. + */ + +declare(strict_types=1); + +use Predis\Client as PredisClient; +use Predis\ClientInterface; + +require_once __DIR__ . '/EventStream.php'; + +class ConsumerWorker +{ + private EventStream $stream; + private string $group; + private string $name; + private int $processLatencyMs; + private int $recentCapacity; + private ClientInterface $redis; + + public function __construct( + EventStream $stream, + string $group, + string $name, + int $processLatencyMs = 25, + int $recentCapacity = 20 + ) { + $this->stream = $stream; + $this->group = $group; + $this->name = $name; + $this->processLatencyMs = $processLatencyMs; + $this->recentCapacity = $recentCapacity; + $this->redis = $stream->client(); + } + + public function group(): string + { + return $this->group; + } + + public function name(): string + { + return $this->name; + } + + // ------------------------------------------------------------------ + // Demo levers — written by the demo server, read by the worker + // ------------------------------------------------------------------ + + public function pause(): void + { + $this->redis->set(EventStream::workerKey($this->group, $this->name, 'paused'), '1'); + } + + public function resume(): void + { + $this->redis->del([ + EventStream::workerKey($this->group, $this->name, 'paused'), + EventStream::workerKey($this->group, $this->name, 'idle'), + ]); + } + + /** + * Drop the next `$count` deliveries without acking them. + * + * The entries stay in the group's PEL with their delivery counter + * incremented, so `XAUTOCLAIM` can recover them once they exceed + * the idle threshold. + */ + public function crashNext(int $count): void + { + if ($count <= 0) { + return; + } + $this->redis->incrby( + EventStream::workerKey($this->group, $this->name, 'crash_next'), + $count + ); + } + + // ------------------------------------------------------------------ + // Introspection (read by demo server's /state) + // ------------------------------------------------------------------ + + /** + * @return list> + */ + public function recent(): array + { + $rawList = $this->redis->lrange( + EventStream::workerKey($this->group, $this->name, 'recent'), + 0, + $this->recentCapacity - 1 + ); + $out = []; + foreach ((array) $rawList as $line) { + $decoded = json_decode((string) $line, true); + if (is_array($decoded)) { + $out[] = $decoded; + } + } + return $out; + } + + /** + * @return array + */ + public function status(): array + { + $pid = (int) $this->redis->get(EventStream::workerKey($this->group, $this->name, 'pid')); + $processed = (int) $this->redis->get(EventStream::workerKey($this->group, $this->name, 'processed')); + $reaped = (int) $this->redis->get(EventStream::workerKey($this->group, $this->name, 'reaped')); + $crashedDrops = (int) $this->redis->get(EventStream::workerKey($this->group, $this->name, 'crashed_drops')); + $crashQueued = (int) $this->redis->get(EventStream::workerKey($this->group, $this->name, 'crash_next')); + $paused = ((string) $this->redis->get(EventStream::workerKey($this->group, $this->name, 'paused'))) === '1'; + $alive = $pid > 0 && self::isAlive($pid); + + return [ + 'name' => $this->name, + 'group' => $this->group, + 'processed' => $processed, + 'reaped' => $reaped, + 'crashed_drops' => $crashedDrops, + 'paused' => $paused, + 'crash_queued' => $crashQueued, + 'alive' => $alive, + 'pid' => $pid, + ]; + } + + // ------------------------------------------------------------------ + // Recovery — runs in whichever PHP process calls it + // ------------------------------------------------------------------ + + /** + * Run `XAUTOCLAIM` into this consumer and process the claimed entries. + * + * Safe to call from the demo server. The heavy lifting is + * `EventStream::autoclaim()` (a Redis call) and the sequential + * per-entry handling. + * + * `deletedIds` are PEL entries whose stream payload was already + * trimmed by `MAXLEN ~`/`XTRIM` before the sweep ran. Redis 7+ + * removes them from the PEL inside `XAUTOCLAIM` itself, so the + * caller does not have to `XACK` them; they are reported so the + * caller can route them to a dead-letter store. + * + * @return array{claimed:int, processed:int, deleted_ids:list} + */ + public function reapIdlePel(): array + { + $result = $this->stream->autoclaim($this->group, $this->name, 100, '0-0', 10); + $claimed = $result['claimed']; + $deletedIds = $result['deletedIds']; + + $processed = 0; + foreach ($claimed as [$entryId, $fields]) { + try { + $this->handleEntry($entryId, $fields, /*viaReap*/ true); + $processed++; + } catch (\Throwable $exc) { + fwrite(STDERR, "[{$this->group}/{$this->name}] reap failed on {$entryId}: " . $exc->getMessage() . "\n"); + } + } + if ($processed > 0) { + $this->redis->incrby( + EventStream::workerKey($this->group, $this->name, 'reaped'), + $processed + ); + } + return [ + 'claimed' => count($claimed), + 'processed' => $processed, + 'deleted_ids' => $deletedIds, + ]; + } + + // ------------------------------------------------------------------ + // Main loop — runs only in the spawned worker process + // ------------------------------------------------------------------ + + public function run(): void + { + $stop = false; + if (function_exists('pcntl_async_signals')) { + pcntl_async_signals(true); + pcntl_signal(SIGTERM, function () use (&$stop) { $stop = true; }); + pcntl_signal(SIGINT, function () use (&$stop) { $stop = true; }); + } + + $pausedKey = EventStream::workerKey($this->group, $this->name, 'paused'); + $idleKey = EventStream::workerKey($this->group, $this->name, 'idle'); + + // Record our PID so the demo server can kill us. + $this->redis->set( + EventStream::workerKey($this->group, $this->name, 'pid'), + (string) getmypid() + ); + + while (!$stop) { + // Cross-process pause: the demo server flips `paused=1` + // and waits for `idle=1` before doing surgery on this + // worker's group. The worker writes `idle=1` only while + // it's parked, and clears it as soon as it resumes. + if ((string) $this->redis->get($pausedKey) === '1') { + $this->redis->set($idleKey, '1'); + usleep(20 * 1000); + continue; + } else { + // Clear the idle flag eagerly so a previous pause + // doesn't linger after resume(). + $this->redis->del([$idleKey]); + } + + try { + $entries = $this->stream->consume($this->group, $this->name, 10, 500); + } catch (\Throwable $exc) { + fwrite(STDERR, "[{$this->group}/{$this->name}] read failed: " . $exc->getMessage() . "\n"); + usleep(500 * 1000); + continue; + } + + foreach ($entries as [$entryId, $fields]) { + if ($stop) { + break; + } + if ($this->processLatencyMs > 0) { + usleep($this->processLatencyMs * 1000); + } + try { + $this->handleEntry($entryId, $fields, /*viaReap*/ false); + } catch (\Throwable $exc) { + // A failure here (typically XACK against Redis) + // must not kill the process — the entry stays + // unacked, the next reapIdlePel call (here or on + // any consumer in the group) can recover it once + // it exceeds the idle threshold. + fwrite(STDERR, "[{$this->group}/{$this->name}] failed to handle {$entryId}: " . $exc->getMessage() . "\n"); + $this->appendRecent([ + 'id' => $entryId, + 'type' => $fields['type'] ?? '', + 'fields' => $fields, + 'acked' => false, + 'note' => 'handler error: ' . $exc->getMessage(), + ]); + } + } + } + + // Clear the idle flag so the demo server's pause-wait doesn't + // hang on a stopped worker. + $this->redis->del([$idleKey]); + fwrite(STDERR, "[{$this->group}/{$this->name}] stopped pid=" . getmypid() . "\n"); + } + + /** + * @param array $fields + */ + private function handleEntry(string $entryId, array $fields, bool $viaReap): void + { + // Crash simulation: DECR returns the new value. We claim a + // "slot" only if the pre-decrement value was > 0. Doing this + // via a single DECRBY would let us race past zero, so use a + // tiny Lua script to do the conditional decrement atomically. + if (!$viaReap) { + $shouldDrop = $this->tryConsumeCrashSlot(); + if ($shouldDrop) { + $this->redis->incr(EventStream::workerKey($this->group, $this->name, 'crashed_drops')); + $this->appendRecent([ + 'id' => $entryId, + 'type' => $fields['type'] ?? '', + 'fields' => $fields, + 'acked' => false, + 'note' => 'dropped (simulated crash)', + ]); + return; + } + } + + $this->stream->ack($this->group, [$entryId]); + if (!$viaReap) { + $this->redis->incr(EventStream::workerKey($this->group, $this->name, 'processed')); + } + $this->appendRecent([ + 'id' => $entryId, + 'type' => $fields['type'] ?? '', + 'fields' => $fields, + 'acked' => true, + 'note' => $viaReap ? 'reaped' : '', + ]); + } + + /** + * Atomically decrement the `crash_next` counter iff it's > 0. + * Returns true when a slot was consumed (meaning: drop this entry). + */ + private function tryConsumeCrashSlot(): bool + { + $key = EventStream::workerKey($this->group, $this->name, 'crash_next'); + // EVAL via Predis: pass keys count first, then keys, then args. + $lua = <<<'LUA' +local v = tonumber(redis.call('GET', KEYS[1]) or '0') +if v > 0 then + redis.call('DECR', KEYS[1]) + return 1 +end +return 0 +LUA; + $result = $this->redis->eval($lua, 1, $key); + return ((int) $result) === 1; + } + + /** + * @param array $record + */ + private function appendRecent(array $record): void + { + $recentKey = EventStream::workerKey($this->group, $this->name, 'recent'); + $line = json_encode($record, JSON_UNESCAPED_SLASHES); + if (!is_string($line)) { + return; + } + $pipe = $this->redis->pipeline(); + $pipe->lpush($recentKey, [$line]); + $pipe->ltrim($recentKey, 0, $this->recentCapacity - 1); + $pipe->execute(); + } + + // ------------------------------------------------------------------ + // Static helpers + // ------------------------------------------------------------------ + + public static function isAlive(int $pid): bool + { + if ($pid <= 0 || !function_exists('posix_kill')) { + return false; + } + // Signal 0 doesn't deliver; it just checks reachability. + return @posix_kill($pid, 0); + } + + public static function deleteWorkerState(ClientInterface $redis, string $group, string $name): void + { + $redis->del([ + EventStream::workerKey($group, $name, 'pid'), + EventStream::workerKey($group, $name, 'processed'), + EventStream::workerKey($group, $name, 'reaped'), + EventStream::workerKey($group, $name, 'crashed_drops'), + EventStream::workerKey($group, $name, 'crash_next'), + EventStream::workerKey($group, $name, 'paused'), + EventStream::workerKey($group, $name, 'idle'), + EventStream::workerKey($group, $name, 'recent'), + ]); + } +} + +// --------------------------------------------------------------------- +// CLI entry point +// --------------------------------------------------------------------- + +// Run only when invoked as a script, not when require_once'd. +if (PHP_SAPI === 'cli' && isset($argv) && realpath($argv[0]) === __FILE__) { + // Composer autoloader for Predis when this file is the CLI entry. + $autoload = __DIR__ . '/vendor/autoload.php'; + if (!file_exists($autoload)) { + fwrite(STDERR, "[consumer_worker] missing vendor/autoload.php — run 'composer install' in the demo directory first\n"); + exit(1); + } + require_once $autoload; + + $opts = [ + 'group' => '', + 'name' => '', + 'redis-host' => '127.0.0.1', + 'redis-port' => 6379, + 'stream-key' => 'demo:events:orders', + 'maxlen' => 10000, + 'claim-idle-ms' => 15000, + 'process-latency-ms' => 25, + ]; + + $count = count($argv); + for ($i = 1; $i < $count; $i++) { + $arg = $argv[$i]; + if (strpos($arg, '--') !== 0) { + continue; + } + $key = substr($arg, 2); + $eq = strpos($key, '='); + if ($eq !== false) { + $value = substr($key, $eq + 1); + $key = substr($key, 0, $eq); + } elseif ($i + 1 < $count) { + $value = $argv[++$i]; + } else { + $value = ''; + } + if (!array_key_exists($key, $opts)) { + fwrite(STDERR, "[consumer_worker] unknown option --{$key}\n"); + exit(2); + } + $opts[$key] = $value; + } + + if ($opts['group'] === '' || $opts['name'] === '') { + fwrite(STDERR, "[consumer_worker] --group and --name are required\n"); + exit(2); + } + + $redis = new PredisClient([ + 'host' => (string) $opts['redis-host'], + 'port' => (int) $opts['redis-port'], + // The XREADGROUP block window is half a second, so we set the + // socket read timeout comfortably above that. A value of 0 + // disables the timeout entirely, which would also work. + 'read_write_timeout' => 0, + ]); + + try { + $redis->ping(); + } catch (\Throwable $exc) { + fwrite(STDERR, "[consumer_worker {$opts['group']}/{$opts['name']}] cannot reach Redis: " . $exc->getMessage() . "\n"); + exit(1); + } + + $stream = new EventStream( + $redis, + (string) $opts['stream-key'], + (int) $opts['maxlen'], + (int) $opts['claim-idle-ms'] + ); + // The group should already exist (the demo server seeds it on + // start), but ensureGroup is idempotent. + $stream->ensureGroup((string) $opts['group'], '0-0'); + + $worker = new ConsumerWorker( + $stream, + (string) $opts['group'], + (string) $opts['name'], + (int) $opts['process-latency-ms'] + ); + + fwrite(STDERR, "[consumer_worker {$opts['group']}/{$opts['name']}] starting pid=" . getmypid() . "\n"); + $worker->run(); +} diff --git a/content/develop/use-cases/streaming/php/EventStream.php b/content/develop/use-cases/streaming/php/EventStream.php new file mode 100644 index 0000000000..18b6456aea --- /dev/null +++ b/content/develop/use-cases/streaming/php/EventStream.php @@ -0,0 +1,674 @@ +redis = $redis; + $this->streamKey = $streamKey; + $this->maxlenApprox = $maxlenApprox; + $this->claimMinIdleMs = $claimMinIdleMs; + } + + public function streamKey(): string + { + return $this->streamKey; + } + + public function maxlenApprox(): int + { + return $this->maxlenApprox; + } + + public function claimMinIdleMs(): int + { + return $this->claimMinIdleMs; + } + + public function client(): ClientInterface + { + return $this->redis; + } + + // ------------------------------------------------------------------ + // Producer + // ------------------------------------------------------------------ + + /** + * Append a single event. Returns the stream ID Redis assigned. + * + * @param array $payload + */ + public function produce(string $eventType, array $payload): string + { + $ids = $this->produceBatch([[$eventType, $payload]]); + return $ids[0]; + } + + /** + * Pipeline several `XADD` calls in one round trip. + * + * Each entry carries an approximate `MAXLEN` cap. The `~` flavour + * lets Redis trim at a macro-node boundary, which is much cheaper + * than exact trimming and is the right call for a retention + * guardrail rather than a hard size limit. + * + * @param list}> $events + * @return list + */ + public function produceBatch(array $events): array + { + if (empty($events)) { + return []; + } + // Predis pipelines without MULTI/EXEC return the raw results in + // order. The fluent pipeline closure form is the simplest way + // to drive a non-transactional pipeline in Predis 3.x. + $stream = $this->streamKey; + $maxlen = $this->maxlenApprox; + $results = $this->redis->pipeline(function ($pipe) use ($events, $stream, $maxlen) { + foreach ($events as [$eventType, $payload]) { + $fields = self::encodeFields($eventType, $payload); + // XADD {trim => [MAXLEN, ~, n]} + $pipe->xadd($stream, $fields, '*', ['trim' => ['MAXLEN', '~', $maxlen]]); + } + }); + + $ids = []; + foreach ($results as $id) { + $ids[] = (string) $id; + } + $this->redis->hincrby(self::STATS_KEY, 'produced_total', count($ids)); + return $ids; + } + + /** + * @param array $payload + * @return array + */ + private static function encodeFields(string $eventType, array $payload): array + { + $fields = [ + 'type' => $eventType, + 'ts_ms' => (string) self::nowMs(), + ]; + foreach ($payload as $k => $v) { + $fields[(string) $k] = $v === null ? '' : (string) $v; + } + return $fields; + } + + // ------------------------------------------------------------------ + // Consumer groups + // ------------------------------------------------------------------ + + /** + * Create the consumer group if it doesn't exist. + * + * `$` means "deliver only events appended after this point"; pass + * `0-0` to replay the entire stream into a fresh group. + */ + public function ensureGroup(string $group, string $startId = '$'): void + { + try { + // XGROUP CREATE MKSTREAM + $this->redis->xgroup('CREATE', $this->streamKey, $group, $startId, true); + } catch (ServerException $exc) { + if (strpos($exc->getMessage(), 'BUSYGROUP') === false) { + throw $exc; + } + } + } + + public function deleteGroup(string $group): int + { + try { + return (int) $this->redis->xgroup('DESTROY', $this->streamKey, $group); + } catch (ServerException $exc) { + return 0; + } + } + + /** + * Read new entries for this consumer via `XREADGROUP`. + * + * The `>` ID means "deliver entries this consumer group has not + * delivered to *anyone* yet" — that is the at-least-once path. + * Replaying an explicit ID instead would re-deliver an entry that + * is already in this consumer's pending list (see + * `consumeOwnPel` for that recovery path). + * + * @return list}> + */ + public function consume(string $group, string $consumer, int $count = 10, int $blockMs = 500): array + { + // Predis xreadgroup signature: + // xreadgroup(group, consumer, count, block, noack, key, id) + $raw = $this->redis->xreadgroup( + $group, + $consumer, + $count, + $blockMs, + false, + $this->streamKey, + '>' + ); + return self::flattenReadGroup($raw); + } + + /** + * Re-deliver entries already in this consumer's PEL. + * + * Reading with an explicit ID (`0` here) instead of `>` replays + * the entries already assigned to this consumer name without + * advancing the group's `last-delivered-id`. This is the + * canonical recovery path after a crash on the same consumer + * name, and is also how a consumer picks up entries that another + * consumer (or `XAUTOCLAIM`) handed to it. + * + * @return list}> + */ + public function consumeOwnPel(string $group, string $consumer, int $count = 10): array + { + $raw = $this->redis->xreadgroup( + $group, + $consumer, + $count, + null, + false, + $this->streamKey, + '0' + ); + return self::flattenReadGroup($raw); + } + + /** + * `XACK` a list of IDs. Returns how many Redis confirmed. + * + * @param list $ids + */ + public function ack(string $group, array $ids): int + { + if (empty($ids)) { + return 0; + } + // Predis 3.x XACK takes variadic IDs, not an array. + $n = (int) $this->redis->xack($this->streamKey, $group, ...$ids); + if ($n > 0) { + $this->redis->hincrby(self::STATS_KEY, 'acked_total', $n); + } + return $n; + } + + /** + * Sweep idle pending entries to `$consumer`. + * + * A single `XAUTOCLAIM` call scans up to `$pageCount` PEL entries + * starting at `$startId` and returns a continuation cursor. For a + * full sweep, loop until the cursor returns to `0-0` (with a + * `$maxPages` safety net so a very large PEL can't monopolise the + * call). + * + * Returns `['claimed' => [...], 'deletedIds' => [...]]`. The + * `deletedIds` are PEL entries whose stream payload had already + * been trimmed (typically because `MAXLEN ~` retention outran a + * slow consumer). `XAUTOCLAIM` removes those dangling slots from + * the PEL itself — the caller does **not** need to `XACK` them — + * but they cannot be retried either, so log and route them to a + * dead-letter store. + * + * @return array{claimed: list}>, deletedIds: list} + */ + public function autoclaim( + string $group, + string $consumer, + int $pageCount = 100, + string $startId = '0-0', + int $maxPages = 10 + ): array { + $claimedAll = []; + $deletedAll = []; + $cursor = $startId; + + for ($i = 0; $i < $maxPages; $i++) { + $reply = $this->redis->xautoclaim( + $this->streamKey, + $group, + $consumer, + $this->claimMinIdleMs, + $cursor, + $pageCount + ); + // Reply shape: [nextCursor, [[id, [k,v,k,v,...]], ...], [deletedIds...]] + $nextCursor = isset($reply[0]) ? (string) $reply[0] : '0-0'; + $claimedRaw = $reply[1] ?? []; + $deletedRaw = $reply[2] ?? []; + + foreach ($claimedRaw as $entry) { + if (!is_array($entry) || !isset($entry[0])) { + continue; + } + $id = (string) $entry[0]; + $fields = self::pairsToDict(is_array($entry[1] ?? null) ? $entry[1] : []); + $claimedAll[] = [$id, $fields]; + } + foreach ($deletedRaw as $id) { + $deletedAll[] = (string) $id; + } + + if ($nextCursor === '0-0') { + break; + } + $cursor = $nextCursor; + } + + if (!empty($claimedAll)) { + $this->redis->hincrby(self::STATS_KEY, 'claimed_total', count($claimedAll)); + } + + return ['claimed' => $claimedAll, 'deletedIds' => $deletedAll]; + } + + /** + * Drop a consumer from a group. + * + * `XGROUP DELCONSUMER` destroys this consumer's PEL entries — any + * entry it still owned is no longer tracked anywhere in the group + * and `XAUTOCLAIM` will never find it again. Always + * `handoverPending` (or `XCLAIM` it manually) to a healthy + * consumer first; this method is the raw destructive call and is + * exposed only for explicit cleanup. + */ + public function deleteConsumer(string $group, string $consumer): int + { + try { + return (int) $this->redis->xgroup('DELCONSUMER', $this->streamKey, $group, $consumer); + } catch (ServerException $exc) { + return 0; + } + } + + /** + * Move every PEL entry owned by `$fromConsumer` to `$toConsumer`. + * + * Enumerates the source consumer's PEL with + * `XPENDING ... CONSUMER` and reassigns each ID with `XCLAIM` at + * zero idle time so the move is unconditional. (`XAUTOCLAIM` does + * not filter by source consumer, so it cannot be used for a + * per-consumer handover.) + * + * Call this before `deleteConsumer` whenever the source still has + * pending entries — otherwise `XGROUP DELCONSUMER` would silently + * destroy them and they could never be recovered. + * + * @return int Number of entries successfully claimed by `$toConsumer`. + */ + public function handoverPending( + string $group, + string $fromConsumer, + string $toConsumer, + int $batch = 100 + ): int { + $claimedTotal = 0; + + while (true) { + // XPENDING [IDLE ms] [consumer] + $rows = $this->redis->xpending( + $this->streamKey, + $group, + null, + '-', + '+', + $batch, + $fromConsumer + ); + if (!is_array($rows) || empty($rows)) { + break; + } + $ids = []; + foreach ($rows as $row) { + if (isset($row[0])) { + $ids[] = (string) $row[0]; + } + } + if (empty($ids)) { + break; + } + // XCLAIM + // Predis 3.x XCLAIM takes the IDs as the 5th positional + // argument (array or scalar) — see XCLAIM::setArguments. + $claimed = $this->redis->xclaim( + $this->streamKey, + $group, + $toConsumer, + 0, + $ids + ); + if (is_array($claimed)) { + $claimedTotal += count($claimed); + } + if (count($rows) < $batch) { + break; + } + } + + if ($claimedTotal > 0) { + $this->redis->hincrby(self::STATS_KEY, 'claimed_total', $claimedTotal); + } + return $claimedTotal; + } + + // ------------------------------------------------------------------ + // Replay, length, trim + // ------------------------------------------------------------------ + + /** + * Range read with `XRANGE` for replay or audit. + * + * Read-only: ranges do not update any group cursor and do not ack + * anything. Useful for bootstrapping a new projection, for + * building an audit view, or for debugging what actually went + * through the stream. + * + * @return list}> + */ + public function replay(string $startId = '-', string $endId = '+', int $count = 100): array + { + $raw = $this->redis->xrange($this->streamKey, $startId, $endId, $count); + $out = []; + foreach ($raw as $id => $fields) { + $out[] = [(string) $id, is_array($fields) ? $fields : []]; + } + return $out; + } + + /** + * @return list}> + */ + public function tail(int $count = 10): array + { + $raw = $this->redis->xrevrange($this->streamKey, '+', '-', $count); + $out = []; + foreach ($raw as $id => $fields) { + $out[] = [(string) $id, is_array($fields) ? $fields : []]; + } + return $out; + } + + public function length(): int + { + return (int) $this->redis->xlen($this->streamKey); + } + + public function trimMaxlen(int $maxlen): int + { + return (int) $this->redis->xtrim($this->streamKey, ['MAXLEN', '~'], $maxlen); + } + + public function trimMinid(string $minid): int + { + return (int) $this->redis->xtrim($this->streamKey, ['MINID', '~'], $minid); + } + + // ------------------------------------------------------------------ + // Inspection + // ------------------------------------------------------------------ + + /** + * @return array + */ + public function infoStream(): array + { + try { + $raw = $this->redis->xinfo('STREAM', $this->streamKey); + } catch (ServerException $exc) { + return [ + 'length' => 0, + 'last_generated_id' => null, + 'first_entry_id' => null, + 'last_entry_id' => null, + ]; + } + $first = $raw['first-entry'] ?? null; + $last = $raw['last-entry'] ?? null; + return [ + 'length' => (int) ($raw['length'] ?? 0), + 'last_generated_id' => $raw['last-generated-id'] ?? null, + 'first_entry_id' => is_array($first) && isset($first[0]) ? (string) $first[0] : null, + 'last_entry_id' => is_array($last) && isset($last[0]) ? (string) $last[0] : null, + ]; + } + + /** + * @return list> + */ + public function infoGroups(): array + { + try { + $rows = $this->redis->xinfo('GROUPS', $this->streamKey); + } catch (ServerException $exc) { + return []; + } + $out = []; + foreach ((array) $rows as $row) { + if (!is_array($row)) { + continue; + } + $out[] = [ + 'name' => (string) ($row['name'] ?? ''), + 'consumers' => (int) ($row['consumers'] ?? 0), + 'pending' => (int) ($row['pending'] ?? 0), + 'last_delivered_id' => $row['last-delivered-id'] ?? null, + 'lag' => isset($row['lag']) ? (int) $row['lag'] : null, + ]; + } + return $out; + } + + /** + * @return list> + */ + public function infoConsumers(string $group): array + { + try { + $rows = $this->redis->xinfo('CONSUMERS', $this->streamKey, $group); + } catch (ServerException $exc) { + return []; + } + $out = []; + foreach ((array) $rows as $row) { + if (!is_array($row)) { + continue; + } + $out[] = [ + 'name' => (string) ($row['name'] ?? ''), + 'pending' => (int) ($row['pending'] ?? 0), + 'idle_ms' => (int) ($row['idle'] ?? 0), + ]; + } + return $out; + } + + /** + * Per-entry PEL view (id, consumer, idle, deliveries). + * + * @return list> + */ + public function pendingDetail(string $group, int $count = 20): array + { + try { + $rows = $this->redis->xpending( + $this->streamKey, + $group, + null, + '-', + '+', + $count + ); + } catch (ServerException $exc) { + return []; + } + $out = []; + foreach ((array) $rows as $row) { + if (!is_array($row)) { + continue; + } + $out[] = [ + 'id' => (string) ($row[0] ?? ''), + 'consumer' => (string) ($row[1] ?? ''), + 'idle_ms' => (int) ($row[2] ?? 0), + 'deliveries' => (int) ($row[3] ?? 0), + ]; + } + return $out; + } + + /** + * Counters from Redis hash `demo:streaming:stats` plus the + * per-consumer `processed`/`reaped`/`crashed_drops` sums. + * + * @return array + */ + public function stats(): array + { + $raw = $this->redis->hgetall(self::STATS_KEY); + if (!is_array($raw)) { + $raw = []; + } + return [ + 'produced_total' => (int) ($raw['produced_total'] ?? 0), + 'acked_total' => (int) ($raw['acked_total'] ?? 0), + 'claimed_total' => (int) ($raw['claimed_total'] ?? 0), + ]; + } + + public function resetStats(): void + { + $this->redis->del([self::STATS_KEY]); + } + + /** + * Drop the stream key entirely. Used by the demo's reset path. + */ + public function deleteStream(): void + { + $this->redis->del([$this->streamKey]); + } + + // ------------------------------------------------------------------ + // Helpers + // ------------------------------------------------------------------ + + /** + * Flatten an XREADGROUP reply into a list of (id, fields_dict). + * + * The Predis wire shape is: + * [ [streamKey, [ [id, [k,v,k,v,...]], ... ]], ... ] + * + * @return list}> + */ + private static function flattenReadGroup($raw): array + { + if (!is_array($raw)) { + return []; + } + $out = []; + foreach ($raw as $perStream) { + if (!is_array($perStream) || !isset($perStream[1])) { + continue; + } + foreach ($perStream[1] as $entry) { + if (!is_array($entry) || !isset($entry[0])) { + continue; + } + $id = (string) $entry[0]; + $fields = self::pairsToDict(is_array($entry[1] ?? null) ? $entry[1] : []); + $out[] = [$id, $fields]; + } + } + return $out; + } + + /** + * Convert a flat [k,v,k,v,...] array (the wire shape of stream + * entry fields) into an associative dict. + * + * @param array $pairs + * @return array + */ + private static function pairsToDict(array $pairs): array + { + $out = []; + $count = count($pairs); + for ($i = 0; $i + 1 < $count; $i += 2) { + $out[(string) $pairs[$i]] = (string) $pairs[$i + 1]; + } + return $out; + } + + public static function nowMs(): int + { + return (int) round(microtime(true) * 1000); + } + + // ------------------------------------------------------------------ + // Worker-state keys (used by ConsumerWorker + demo server) + // ------------------------------------------------------------------ + + public static function workerKey(string $group, string $name, string $suffix): string + { + return self::WORKER_KEY_PREFIX . ':' . $group . ':' . $name . ':' . $suffix; + } +} diff --git a/content/develop/use-cases/streaming/php/_index.md b/content/develop/use-cases/streaming/php/_index.md new file mode 100644 index 0000000000..daa9be129c --- /dev/null +++ b/content/develop/use-cases/streaming/php/_index.md @@ -0,0 +1,545 @@ +--- +categories: +- docs +- develop +- stack +- oss +- rs +- rc +description: Implement a Redis event-streaming pipeline in PHP with Predis +linkTitle: Predis example (PHP) +title: Redis streaming with Predis +weight: 7 +--- + +This guide shows you how to build a Redis-backed event-streaming pipeline in PHP with [Predis](https://github.com/predis/predis). It includes a small local web server built on PHP's built-in dev server so you can produce events into a single Redis Stream, watch two independent consumer groups read it at their own pace, and recover stuck deliveries with `XAUTOCLAIM` after simulating a consumer crash. + +## Overview + +A Redis Stream is an append-only log of field/value entries with auto-generated, time-ordered IDs. Producers append with [`XADD`]({{< relref "/commands/xadd" >}}); consumers belong to *consumer groups* and read with [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}). The group as a whole tracks a single `last-delivered-id` cursor, and each consumer gets its own pending-entries list (PEL) of messages it has been handed but not yet acknowledged. Once a consumer has processed an entry it calls [`XACK`]({{< relref "/commands/xack" >}}) to clear the entry from its PEL; entries left unacknowledged past an idle threshold can be reassigned to a healthy consumer with [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}). + +That gives you: + +* Ordered, durable history that many independent consumer groups can read at their own pace +* At-least-once delivery, with per-consumer pending lists and automatic recovery of crashed consumers +* Horizontal scaling within a group — add a consumer and Redis automatically splits the work +* Replay of any range with [`XRANGE`]({{< relref "/commands/xrange" >}}), independent of consumer-group state +* Bounded retention through [`XADD MAXLEN ~`]({{< relref "/commands/xadd" >}}) or + [`XTRIM MINID ~`]({{< relref "/commands/xtrim" >}}), without a separate cleanup job + +In this example, producers append order events (`order.placed`, `order.paid`, `order.shipped`, `order.cancelled`) to a single stream at `demo:events:orders`. Two consumer groups read the same stream: + +* **`notifications`** — two consumers (`worker-a`, `worker-b`) sharing the work, modelling a fan-out worker pool. +* **`analytics`** — one consumer (`worker-c`) processing the full event flow on its own. + +This port is **structurally different from the other clients in this use case**: every other client runs its consumers as in-process threads or async tasks, but PHP's built-in `php -S` development server runs each HTTP request in a brand-new short-lived process, so an in-process consumer would die as soon as the request that started it returned. The helper sidesteps that by spawning each consumer as a detached OS process and keeping every piece of cross-request state in Redis. See [Production usage](#production-usage) for the longer story. + +## How it works + +The flow looks like this: + +1. The application calls `$stream->produce($eventType, $payload)` which runs [`XADD`]({{< relref "/commands/xadd" >}}) with an approximate [`MAXLEN ~`]({{< relref "/commands/xadd" >}}) cap. Redis assigns an auto-generated time-ordered ID. +2. Each consumer process loops on [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}) with the special ID `>` (meaning "deliver entries this group has not yet delivered to anyone") and a short block timeout. +3. After processing each entry, the consumer calls [`XACK`]({{< relref "/commands/xack" >}}) so Redis can drop it from the group's pending list. +4. If a consumer is killed (or crashes) before acking, its entries sit in the group's PEL. A periodic [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}) sweep reassigns idle entries to a healthy consumer. +5. Anyone — including code outside the consumer groups — can read history with [`XRANGE`]({{< relref "/commands/xrange" >}}) without affecting any group's cursor. + +Each consumer group has its own cursor (`last-delivered-id`) and its own pending list, so the two groups in this demo process the same events without coordinating with each other. + +## The event-stream helper + +The `EventStream` class wraps the stream operations +([source](https://github.com/redis/docs/blob/main/content/develop/use-cases/streaming/php/EventStream.php)): + +```php +require __DIR__ . '/vendor/autoload.php'; +require __DIR__ . '/EventStream.php'; + +use Predis\Client as PredisClient; + +$redis = new PredisClient(['host' => '127.0.0.1', 'port' => 6379]); +$stream = new EventStream( + $redis, + 'demo:events:orders', + 2000, // retention guardrail (approximate MAXLEN) + 5000 // XAUTOCLAIM idle threshold (ms) +); + +// Producer +$streamId = $stream->produce('order.placed', [ + 'order_id' => 'o-1234', + 'customer' => 'alice', + 'amount' => '49.50', +]); + +// Consumer group + one consumer +$stream->ensureGroup('notifications', '0-0'); +$entries = $stream->consume('notifications', 'worker-a', count: 10, blockMs: 500); +foreach ($entries as [$entryId, $fields]) { + handle($fields); // your processing + $stream->ack('notifications', [$entryId]); // XACK +} + +// Recover stuck PEL entries by reaping them into a healthy consumer. +// The textbook pattern: each consumer periodically calls XAUTOCLAIM +// with itself as the target and processes whatever it claimed. +// `ConsumerWorker::reapIdlePel()` wraps that flow; the low-level +// helper `$stream->autoclaim($group, $target)` is also available if +// you want to drive XAUTOCLAIM directly. +$result = $worker->reapIdlePel(); +// $result == ['claimed' => N, 'processed' => M, 'deleted_ids' => [...]] +// deleted_ids are PEL entries whose payload was already trimmed. +// Redis 7+ has already removed those slots from the PEL, so no XACK +// is needed — log them and route to a dead-letter store for audit. + +// Replay history (independent of any group's cursor) +foreach ($stream->replay('-', '+', 50) as [$entryId, $fields]) { + print "$entryId " . json_encode($fields) . "\n"; +} +``` + +### Data model + +Each event is a single stream entry — a flat dict of field/value strings — with an auto-generated time-ordered ID: + +```text +demo:events:orders + 1716998413541-0 type=order.placed order_id=o-1234 customer=alice amount=49.50 ts_ms=... + 1716998413542-0 type=order.paid order_id=o-1234 customer=alice amount=49.50 ts_ms=... + 1716998413542-1 type=order.shipped order_id=o-1235 customer=bob amount=12.00 ts_ms=... + ... +``` + +The ID is `{milliseconds}-{sequence}`, monotonically increasing within the stream, so you can range-query by approximate wall-clock time without an extra index. (IDs are ordered within a stream, not across streams — two events appended to different streams at the same millisecond can produce the same ID.) + +The PHP port also keeps a small per-process bookkeeping keyspace under `demo:streaming:*` so every fresh `php -S` request can find the running consumers: + +```text +demo:streaming:stats (hash) produced_total, acked_total, claimed_total +demo:streaming:workers (set) "{group}/{name}" entries for every spawned worker +demo:streaming:worker:{group}:{name}:pid (string) worker process PID +demo:streaming:worker:{group}:{name}:processed (string) per-consumer ack count +demo:streaming:worker:{group}:{name}:reaped (string) per-consumer XAUTOCLAIM-claimed count +demo:streaming:worker:{group}:{name}:crashed_drops (string) per-consumer drop count +demo:streaming:worker:{group}:{name}:crash_next (string) integer counter for the crash lever +demo:streaming:worker:{group}:{name}:paused (string) "1" while paused, deleted otherwise +demo:streaming:worker:{group}:{name}:idle (string) "1" while the worker has acknowledged the pause +demo:streaming:worker:{group}:{name}:recent (list) last N processed entries for the UI +``` + +The implementation uses: + +* [`XADD ... MAXLEN ~ n`]({{< relref "/commands/xadd" >}}), pipelined, for batch production with a retention cap +* [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}) with the special ID `>` for fresh deliveries to a consumer +* [`XACK`]({{< relref "/commands/xack" >}}) on every processed entry +* [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}) for sweeping idle pending entries to a healthy consumer +* [`XCLAIM`]({{< relref "/commands/xclaim" >}}) for handing one consumer's PEL over to a peer before removing it +* [`XRANGE`]({{< relref "/commands/xrange" >}}) for replay and audit +* [`XPENDING`]({{< relref "/commands/xpending" >}}) for inspecting the per-group pending list +* [`XINFO STREAM`]({{< relref "/commands/xinfo-stream" >}}), + [`XINFO GROUPS`]({{< relref "/commands/xinfo-groups" >}}), and + [`XINFO CONSUMERS`]({{< relref "/commands/xinfo-consumers" >}}) for surface-level observability +* [`XTRIM`]({{< relref "/commands/xtrim" >}}) for explicit retention enforcement + +## Producing events + +`produceBatch()` pipelines `XADD` calls in a single round trip. Each call carries an approximate `MAXLEN ~` cap so the stream stays bounded as it rolls forward: + +```php +public function produceBatch(array $events): array +{ + $stream = $this->streamKey; + $maxlen = $this->maxlenApprox; + $results = $this->redis->pipeline(function ($pipe) use ($events, $stream, $maxlen) { + foreach ($events as [$eventType, $payload]) { + $fields = self::encodeFields($eventType, $payload); + // XADD {trim => [MAXLEN, ~, n]} + $pipe->xadd($stream, $fields, '*', ['trim' => ['MAXLEN', '~', $maxlen]]); + } + }); + // ... +} +``` + +The `~` flavour of `MAXLEN` lets Redis trim at a macro-node boundary, which is much cheaper than exact trimming and is what you want when the cap is a retention *guardrail*, not a hard size constraint. With 300 events produced and `MAXLEN ~ 50`, you might end up with 100 entries left — Redis released the oldest whole macro-node and stopped. The next `XADD` will keep length stable. + +If you genuinely need an exact cap (rare), drop the `~` from the `trim` array. The performance difference is significant on busy streams. + +Predis 3.x's `xadd()` takes the fields as an **associative array** in the second argument and the trim options as an `['trim' => ['MAXLEN', '~', n]]` entry in the fourth — a different shape from `hset()`, which takes its fields variadically. Skim the [Predis stream tests](https://github.com/predis/predis/tree/main/tests/Predis/Command/Redis) if you need to confirm the signature for a command this guide doesn't show. + +## Reading with a consumer group + +Each consumer runs the same `XREADGROUP` loop. The special ID `>` means "deliver entries this group has not yet delivered to *anyone*": + +```php +public function consume(string $group, string $consumer, int $count = 10, int $blockMs = 500): array +{ + // Predis xreadgroup signature: + // xreadgroup(group, consumer, count, block, noack, key, id) + $raw = $this->redis->xreadgroup( + $group, $consumer, $count, $blockMs, false, + $this->streamKey, '>' + ); + return self::flattenReadGroup($raw); +} +``` + +`blockMs` makes the call efficient even when the stream is idle: the client parks on the server until either an entry arrives or the timeout expires, so consumers don't busy-loop. + +Reading with an explicit ID like `0-0` instead of `>` does something different — it replays entries already delivered to *this* consumer name (its private PEL). That is the canonical recovery path when the same consumer restarts: catch up on its own pending entries first, then resume reading new ones. The helper exposes that as `consumeOwnPel()`. + +## Acknowledging entries + +Once the consumer has processed an entry, `XACK` tells Redis it can drop the entry from the group's pending list. Predis 3.x's `xack()` takes the IDs **variadically** rather than as a single array, so the helper unpacks the list with `...`: + +```php +public function ack(string $group, array $ids): int +{ + if (empty($ids)) { + return 0; + } + $n = (int) $this->redis->xack($this->streamKey, $group, ...$ids); + if ($n > 0) { + $this->redis->hincrby(self::STATS_KEY, 'acked_total', $n); + } + return $n; +} +``` + +This is the linchpin of at-least-once delivery: an entry that is never acked stays in the PEL until a claim moves it elsewhere. If your consumer process crashes between processing and ack, the next claim sweep picks the entry back up. The one caveat is retention: `XADD MAXLEN ~` and `XTRIM` can release the entry's *payload* even while its ID is still in the PEL. The next `XAUTOCLAIM` returns those IDs in its `deletedIds` list and removes them from the PEL inside the same command — the entry cannot be retried, so the caller should log it and route to a dead-letter store for audit. + +The trade-off is the opposite of pub/sub: a slow or crashed consumer doesn't lose messages, but it does mean your downstream system must be idempotent. If you process an order twice because the first attempt died after the side effect but before the ack, the second attempt must be safe. + +## Multiple consumer groups, one stream + +The big difference between Redis Streams and a job queue is that any number of independent consumer groups can read the same stream. The demo sets up two groups on `demo:events:orders`: + +```php +$stream->ensureGroup('notifications', '0-0'); +$stream->ensureGroup('analytics', '0-0'); +``` + +Each group has its own cursor. Producing 5 events results in `notifications` and `analytics` each receiving all 5, with no coordination between them. Within `notifications`, the work is split across `worker-a` and `worker-b`: Redis hands each `XREADGROUP` call whatever entries are not yet delivered to anyone in the group, so adding a second worker doubles throughput without any rebalance logic. + +The `'0-0'` argument means "deliver everything in the stream from the beginning" — useful in a demo and for fresh groups bootstrapping from history. In production, a brand-new group reading a long-existing stream usually starts at `$` ("only events after this point") and uses [`XRANGE`]({{< relref "/commands/xrange" >}}) explicitly if it needs history. + +## Recovering crashed consumers with XAUTOCLAIM + +The demo's "Crash next 3" button tells a chosen consumer to drop its next three deliveries on the floor without acking them — the same effect as a worker process dying mid-message. Those entries stay in the group's PEL with their delivery counter incremented. Once they have been idle for at least `claimMinIdleMs`, any healthy consumer in the group can rescue them by calling `XAUTOCLAIM` *with itself as the target*. `ConsumerWorker::reapIdlePel()` wraps that pattern: + +```php +public function reapIdlePel(): array +{ + $result = $this->stream->autoclaim($this->group, $this->name, 100, '0-0', 10); + $claimed = $result['claimed']; + $deletedIds = $result['deletedIds']; + + $processed = 0; + foreach ($claimed as [$entryId, $fields]) { + try { + $this->handleEntry($entryId, $fields, /*viaReap*/ true); + $processed++; + } catch (\Throwable $exc) { + fwrite(STDERR, "[{$this->group}/{$this->name}] reap failed on {$entryId}: " . $exc->getMessage() . "\n"); + } + } + return [ + 'claimed' => count($claimed), + 'processed' => $processed, + 'deleted_ids' => $deletedIds, + ]; +} +``` + +The underlying `$stream->autoclaim()` helper pages through the group's PEL with `XAUTOCLAIM`'s continuation cursor: + +```php +public function autoclaim( + string $group, string $consumer, + int $pageCount = 100, string $startId = '0-0', int $maxPages = 10 +): array { + $claimedAll = []; $deletedAll = []; $cursor = $startId; + for ($i = 0; $i < $maxPages; $i++) { + $reply = $this->redis->xautoclaim( + $this->streamKey, $group, $consumer, + $this->claimMinIdleMs, $cursor, $pageCount + ); + // Reply shape: [nextCursor, [[id, [k,v,k,v,...]], ...], [deletedIds...]] + $nextCursor = (string) $reply[0]; + foreach (($reply[1] ?? []) as $entry) { + $claimedAll[] = [(string) $entry[0], self::pairsToDict($entry[1] ?? [])]; + } + foreach (($reply[2] ?? []) as $id) { $deletedAll[] = (string) $id; } + if ($nextCursor === '0-0') break; + $cursor = $nextCursor; + } + return ['claimed' => $claimedAll, 'deletedIds' => $deletedAll]; +} +``` + +A single `XAUTOCLAIM` call scans up to `pageCount` PEL entries starting at `startId`, reassigns the ones idle for at least `claimMinIdleMs` to the named consumer, and returns a continuation cursor in the first slot of the reply. For a full sweep, loop until the cursor returns to `0-0` (with a `maxPages` safety net so one call cannot monopolise a very large PEL). The delivery counter is incremented on every claim — after a few cycles you can use it to spot a *poison-pill* message that crashes every consumer that touches it, and route it to a dead-letter stream so the bad entry stops cycling. (New entries keep flowing past the poison pill — `XREADGROUP >` still delivers fresh work — but the bad entry's repeated reclaim wastes consumer time and keeps the PEL larger than it needs to be.) + +The `deletedIds` list contains PEL entry IDs whose stream payload was already trimmed by the time the claim ran (typically because `MAXLEN ~` retention outran a slow consumer). `XAUTOCLAIM` removes those dangling slots from the PEL itself, so the caller does *not* need to `XACK` them — but the entries cannot be retried either, so log and route them to a dead-letter store for offline inspection. Redis 7.0 introduced this third return element; the example requires Redis 7.0+ for that reason. + +`reapIdlePel` is the right primitive for the recovery path because it claims and processes in one step: every entry the call returned is now in *this* consumer's PEL, so the same consumer is responsible for processing and acking it. In production each consumer process runs `reapIdlePel` periodically (every few seconds, on a timer) so a crashed peer's entries never sit invisibly. The demo exposes it as a manual button so you can trigger the reap after waiting for the idle threshold. + +`XCLAIM` (singular, no auto) does the same thing for a specific list of entry IDs you already have in hand — useful when you want to take ownership of one known stuck entry, or when you need to move a specific consumer's PEL to a peer (the case the demo's "Remove consumer" button handles via `handoverPending()`). `XAUTOCLAIM` cannot filter by source consumer, so it cannot be used for a per-consumer handover. + +## Replay with XRANGE + +`XRANGE` reads a slice of history. It is completely independent of any consumer group — no cursors move, no acks happen — so it is safe to call any number of times, from any process: + +```php +public function replay(string $startId = '-', string $endId = '+', int $count = 100): array +{ + $raw = $this->redis->xrange($this->streamKey, $startId, $endId, $count); + $out = []; + foreach ($raw as $id => $fields) { + $out[] = [(string) $id, is_array($fields) ? $fields : []]; + } + return $out; +} +``` + +The special IDs `-` and `+` mean "from the very beginning" and "to the very end". You can also pass real IDs (`1716998413541-0`) or just the millisecond part (`1716998413541`, which Redis interprets as "any entry with this timestamp"). + +Typical uses: + +* **Bootstrapping a new projection** — read the entire stream from `-` and build a derived view in another store (a search index, a SQL table, a different cache). Doing this against a consumer group would consume the entries; `XRANGE` lets you do it without disrupting live consumers. +* **Auditing recent activity** — read the last few minutes by ID range without touching any group cursor. +* **Debugging** — fetch one specific entry by its ID, or a tight range around an incident timestamp, to see exactly what producers wrote. + +## The consumer worker process + +`ConsumerWorker` wraps the `XREADGROUP` → process → `XACK` loop and is intended to run as a **separate CLI process** +([source](https://github.com/redis/docs/blob/main/content/develop/use-cases/streaming/php/ConsumerWorker.php)): + +```php +public function run(): void +{ + // SIGTERM handler so the demo server's posix_kill gives the + // worker a chance to leave the loop cleanly. + $stop = false; + pcntl_async_signals(true); + pcntl_signal(SIGTERM, function () use (&$stop) { $stop = true; }); + + while (!$stop) { + // Cross-process pause flag (see Production usage). + if ((string) $this->redis->get($pausedKey) === '1') { + $this->redis->set($idleKey, '1'); + usleep(20 * 1000); + continue; + } + $entries = $this->stream->consume($this->group, $this->name, 10, 500); + foreach ($entries as [$entryId, $fields]) { + usleep($this->processLatencyMs * 1000); + $this->handleEntry($entryId, $fields, /*viaReap*/ false); + } + } +} +``` + +`handleEntry()` either acks (the normal path) or, when the demo's `crash_next` counter is `> 0`, drops the entry on the floor and increments the per-consumer `crashed_drops` count. The crash check uses a tiny Lua script (`GET` + conditional `DECR`) so two simultaneous deliveries can't both undercount the counter past zero. + +Recovery of stuck PEL entries — this consumer's, after a restart, or another consumer's, after a crash — runs through `reapIdlePel()` rather than the read loop. That method calls `XAUTOCLAIM` with this consumer as the target, then processes whatever was claimed in the same flow as new entries. This is the textbook Streams pattern: each consumer is its own reaper, running `XAUTOCLAIM(self)` periodically (or on demand) so a crashed peer's entries never sit invisibly in the PEL. The demo's "XAUTOCLAIM to selected" button calls `reapIdlePel()` on the chosen consumer; in production you would run it from a timer every few seconds. + +Note that the worker's main read loop deliberately does *not* call `XREADGROUP 0` to drain its own PEL on every iteration. That would re-deliver every pending entry continuously and *reset its idle counter to zero* each time, which would keep crashed entries below the `XAUTOCLAIM` threshold forever. Using `XAUTOCLAIM(self)` as the recovery primitive — which only fires for entries idle longer than `min_idle_time` — avoids that whole class of bug. + +The pause and crash levers exist only for the demo. A real consumer is just the read-process-ack loop — everything else in this class is instrumentation. + +## Prerequisites + +* Redis 7.0 or later. `XAUTOCLAIM` was added in Redis 6.2, but its reply gained a third + element (the list of deleted IDs) in 7.0; the example relies on that shape. +* PHP 8.1 or later, with the `pcntl` and `posix` extensions enabled (both ship with the + official PHP binary on macOS and most Linux distros). +* The Predis client (3.x). Install it with [Composer](https://getcomposer.org/): + + ```bash + composer require "predis/predis:^3.0" + ``` + +If your Redis server is running elsewhere, start the demo with `REDIS_HOST=...` and `REDIS_PORT=...` (see [Start the demo server](#start-the-demo-server) below). + +## Running the demo + +### Get the source files + +The demo consists of four files plus the Composer manifest. Download them from the [`php` source folder](https://github.com/redis/docs/tree/main/content/develop/use-cases/streaming/php) on GitHub, or grab them with `curl`: + +```bash +mkdir streaming-demo && cd streaming-demo +BASE=https://raw.githubusercontent.com/redis/docs/main/content/develop/use-cases/streaming/php +curl -O $BASE/EventStream.php +curl -O $BASE/ConsumerWorker.php +curl -O $BASE/demo_server.php +curl -O $BASE/composer.json +``` + +Then install dependencies: + +```bash +composer install +``` + +### Start the demo server + +From that directory: + +```bash +php -S 127.0.0.1:8083 demo_server.php +``` + +You should see: + +```text +[...] PHP 8.4.6 Development Server (http://127.0.0.1:8083) started +``` + +By default the demo wipes the configured stream key on startup so each run starts from a clean state. The Composer-built-in `php -S` doesn't accept user CLI flags through to the script, so the demo uses environment variables for the equivalent overrides: + +| Env var | CLI equivalent | Default | Meaning | +|----------------------|----------------------|------------------------|-----------------------------------------------------------------------------------------------| +| `REDIS_HOST` | `--redis-host` | `127.0.0.1` | Redis host the demo server and every worker connect to. | +| `REDIS_PORT` | `--redis-port` | `6379` | Redis port. | +| `STREAM_KEY` | `--stream-key` | `demo:events:orders` | The Redis Stream key the demo writes to and reads from. | +| `MAXLEN` | `--maxlen` | `2000` | Approximate `MAXLEN ~` cap on every `XADD`. | +| `CLAIM_IDLE_MS` | `--claim-idle-ms` | `5000` | Minimum idle time before `XAUTOCLAIM` may reassign a pending entry. | +| `NO_RESET` | `--no-reset` | (reset on first request) | Set to `1` to keep any existing data at `STREAM_KEY` instead of dropping it on first request. | +| `PROCESS_LATENCY_MS` | — | `25` | Per-entry processing latency the workers simulate (purely for visualisation). | + +For example, to point the demo at a different stream and tighten the autoclaim window: + +```bash +STREAM_KEY=demo:events:orders-php CLAIM_IDLE_MS=500 php -S 127.0.0.1:8083 demo_server.php +``` + +Open [http://127.0.0.1:8083](http://127.0.0.1:8083) in a browser. You can: + +* **Produce** any number of events of a chosen type (or random types). Watch the stream length grow and the tail update. +* See each **consumer group**: its `last-delivered-id`, the size of its pending list, and the consumers in it. Each consumer shows its processed count, pending count, and idle time. +* **Add or remove** consumers within a group at runtime to see Redis split the work across the new shape. +* Click **Crash next 3** on a consumer to drop its next three deliveries — the same effect as a worker process dying after `XREADGROUP` but before `XACK`. Watch the **Pending entries (XPENDING)** panel fill up. +* Wait until the idle time exceeds the threshold (default 5000 ms), pick a healthy target consumer, and click **XAUTOCLAIM to selected** — the stuck entries are reassigned and the delivery counter increments. +* **Replay (XRANGE)** any range to confirm the full history is independent of consumer-group state. +* **XTRIM** with an approximate `MAXLEN` to bound retention. Note that an approximate trim only releases whole macro-nodes — `MAXLEN ~ 50` on a small stream may not delete anything; on a 300-entry stream it typically lands at around 100. +* Click **Reset demo** to drop the stream, kill every worker, and re-seed the default groups. + +### Stopping the demo cleanly + +`php -S` doesn't run a shutdown handler when you Ctrl-C out of it, and the consumer worker processes — which are *intentionally* detached so they survive request boundaries — will outlive the demo server unless you stop them first. Before stopping the server, click **Reset demo** in the UI (which kills every worker), or run: + +```bash +curl -X POST http://127.0.0.1:8083/reset +``` + +If you forgot, clean up by name: + +```bash +pgrep -f ConsumerWorker.php | xargs kill +redis-cli --scan --pattern 'demo:streaming:*' | xargs redis-cli del +``` + +## Production usage + +### Why this PHP port differs from the others + +Every other client in this use case keeps consumers as in-process objects with a background thread (Python's `threading.Thread`, .NET's task pool, Node's event loop, Go's goroutines, etc.). That works because those runtimes have a long-lived server process that owns the consumer's connection, callback, and dispatch loop. + +PHP's traditional one-process-per-request model — used by `php -S`, mod_php, PHP-FPM with the default `pm` setting — fundamentally doesn't fit that shape. A consumer created inside an HTTP handler dies the moment the handler returns. Even if you used a long-running PHP daemon (Roadrunner, Swoole, ReactPHP), you'd still need separate worker processes if you wanted multiple independent consumers, because Predis's blocking `XREADGROUP` call blocks the calling process. + +This port therefore keeps each consumer as a **separate OS process**, with its full state (PID, processed/reaped/dropped counts, recent buffer) persisted in Redis. Every HTTP request reconstructs its view of the consumer registry from those keys. The pattern is closer to how a real production PHP application would run stream consumers: a `supervisord`, `systemd`, or container orchestrator drives N copies of `ConsumerWorker.php`, each owning one logical consumer in one logical group, and the web tier never tries to host a consumer itself. The demo just inlines the supervision (via `proc_open` + `posix_kill`) so a single `php -S` command is enough to play with the pattern end-to-end. + +Two cross-process subtleties are worth calling out, because both will bite anyone who tries to copy this pattern naively: + +* **Capture the worker's real PID, not a wrapper's.** A `proc_open(['setsid', '-f', $args...])` call returns the *wrapper's* PID — `setsid -f` forks, exec's the worker as the new session leader, and the wrapper exits. A subsequent `posix_kill($recordedPid, SIGTERM)` then signals a dead PID and the worker survives. The fix is a shell-wrapped `& echo $!` pattern that backgrounds the worker and echoes its real PID back through the wrapper's stdout pipe, which is what `spawnWorker()` in `demo_server.php` does on both Linux and macOS. +* **Pause/resume across processes uses Redis flags, not in-process events.** The reference Python port uses a `threading.Event` to park a consumer while the demo server hands its PEL off to a peer; this port uses two Redis keys per worker (`paused` and `idle`). The demo server `SET`s `paused=1`, waits for the worker to write `idle=1` (the worker checks the flag at the top of every loop iteration with a 20 ms spin-wait), runs the surgical operation, then `DEL`s both keys. That's what makes the "Remove consumer" handover safe even though the demo server can't touch the worker's in-memory state directly. + +### Pick retention by length or by minimum ID + +The demo uses `MAXLEN ~` on every `XADD`. Two alternatives are worth considering: + +* `MINID ~ ` — keep only entries newer than an ID. If you want "the last 24 hours", compute the wall-clock cutoff and pass `XTRIM MINID ~ -0`. This is the right pattern when retention is time-bounded. +* No cap on `XADD` plus a periodic `XTRIM` job — useful if your producer is hot and the per-`XADD` work has to stay minimal, or if retention rules are complex (a separate process can also factor in consumer-group lag). + +In all three cases the trimming is approximate by default. Use exact trimming (`MAXLEN n` or `MINID id` without `~`) only when you genuinely need an exact count. + +### Don't let consumer-group lag silently grow + +`XINFO GROUPS` reports each group's `lag` (entries the group has not yet read) and `pending` (entries delivered but not acked). In production, alert on either of these crossing a threshold — a steadily growing pending count usually means consumers are crashing without `XAUTOCLAIM` running, and a growing lag means consumers can't keep up with producers. + +The same applies inside a group: `XINFO CONSUMERS` reports per-consumer pending counts and idle times, so you can spot one slow consumer holding entries that the rest of the group is waiting on. + +### Make consumer logic idempotent + +`XAUTOCLAIM` can re-deliver an entry to a different consumer after a crash. If your processing has side effects (sending email, charging a card, updating a downstream store), make sure the same entry processed twice gives the same result — use an idempotency key, an upsert with conditional check, or a once-per-id guard table. Redis Streams cannot give you exactly-once semantics on its own. + +### Bound the delivery counter as a poison-pill signal + +`XPENDING` returns each entry's delivery count, incremented on every claim. If an entry has been delivered (and dropped) several times, the next consumer is unlikely to fare better. After some threshold — `deliveries >= 5`, say — route the entry to a *dead-letter stream*, ack it on the original group, and alert. New entries keep flowing past a poison pill (`XREADGROUP >` still delivers fresh work), but the bad entry's repeated reclaim wastes consumer time and keeps the PEL bigger than it needs to be — without a DLQ threshold it can also slowly trip retention/lag alerts. + +### Partition by tenant or entity for scale + +A single Redis Stream is a single key, and on a Redis Cluster a single key lives on a single shard. If your throughput exceeds what one shard can handle, partition the stream — for example by tenant ID (`events:orders:{tenant_a}`, `events:orders:{tenant_b}`) — so different tenants land on different shards. Hash-tags (`{tenant_a}`) keep all related streams on the same shard if you need to multi-stream atomically. + +Per-entity partitioning (`events:order:{order_id}`) is the canonical pattern when you treat each entity's stream as the event-sourcing log for that entity: every state change for one order goes on its own stream, which is also bounded in size by the entity's lifetime. + +### Use a separate consumer pool per group + +The demo runs every consumer alongside one demo server. In production each consumer group is usually its own deployment — its own pool of pods or VMs — so a slow projection in `analytics` cannot pull `notifications` workers off their stream. Each pod runs one consumer process per CPU core, with `XAUTOCLAIM` either embedded in the consumer loop (every N reads, claim idle entries to self) or run by a separate reaper. `supervisord`, `systemd`, or a container orchestrator owns the process lifecycle, not your web tier. + +### Don't read with XREAD (no group) and then try to ack + +`XREAD` and `XREADGROUP` are different mechanisms. `XREAD` is a tail-the-log read with no consumer-group state — entries are not added to any PEL, and you cannot `XACK` them. If you want at-least-once delivery and crash recovery, you must read through a consumer group. + +`XREAD` is still useful for read-only tail clients (a UI streaming events, a debugger, a `tail -f`-style command-line tool). It's just not part of the at-least-once path. + +### Inspect the stream directly with redis-cli + +When testing or troubleshooting, inspect the stream directly to confirm the consumer state is what you expect: + +```bash +# Stream summary +redis-cli XLEN demo:events:orders +redis-cli XINFO STREAM demo:events:orders + +# Group cursors and pending counts +redis-cli XINFO GROUPS demo:events:orders + +# Consumers within a group +redis-cli XINFO CONSUMERS demo:events:orders notifications + +# Pending entries with idle time and delivery count +redis-cli XPENDING demo:events:orders notifications - + 20 + +# Tail the stream live (no consumer-group state — like tail -f) +redis-cli XREAD BLOCK 0 STREAMS demo:events:orders '$' + +# Replay a range +redis-cli XRANGE demo:events:orders - + COUNT 50 + +# The PHP port's own bookkeeping keys +redis-cli --scan --pattern 'demo:streaming:*' +``` + +If a group's `lag` is growing while consumers' `idle` times are short, consumers are healthy but producers are outpacing them — add more consumers. If `pending` is growing while `lag` is small, consumers are *receiving* entries but not *acking* them — either they are crashing mid-message or your acking logic has a bug. + +## Learn more + +This example uses the following Redis commands: + +* [`XADD`]({{< relref "/commands/xadd" >}}) to append an event with an approximate `MAXLEN` cap. +* [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}) to read new entries for a consumer in a group. +* [`XACK`]({{< relref "/commands/xack" >}}) to acknowledge a processed entry. +* [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}) to reassign idle pending entries to a healthy consumer. +* [`XCLAIM`]({{< relref "/commands/xclaim" >}}) to reassign specific entry IDs (used here to hand a consumer's PEL over to a peer before deletion). +* [`XRANGE`]({{< relref "/commands/xrange" >}}) for replay and audit, independent of consumer-group state. +* [`XPENDING`]({{< relref "/commands/xpending" >}}) to inspect the per-group pending list with idle times and delivery counts. +* [`XTRIM`]({{< relref "/commands/xtrim" >}}) for explicit retention enforcement. +* [`XGROUP CREATE`]({{< relref "/commands/xgroup-create" >}}) and + [`XGROUP DELCONSUMER`]({{< relref "/commands/xgroup-delconsumer" >}}) to manage groups and consumers. +* [`XINFO STREAM`]({{< relref "/commands/xinfo-stream" >}}), + [`XINFO GROUPS`]({{< relref "/commands/xinfo-groups" >}}), and + [`XINFO CONSUMERS`]({{< relref "/commands/xinfo-consumers" >}}) for observability. + +See the [Predis README](https://github.com/predis/predis) for the full client reference, and the [Streams overview]({{< relref "/develop/data-types/streams" >}}) for the deeper conceptual model — consumer groups, the PEL, claim semantics, capped streams, and the differences with Kafka partitions. diff --git a/content/develop/use-cases/streaming/php/composer.json b/content/develop/use-cases/streaming/php/composer.json new file mode 100644 index 0000000000..4b670e7300 --- /dev/null +++ b/content/develop/use-cases/streaming/php/composer.json @@ -0,0 +1,8 @@ +{ + "name": "redis/streaming-php-demo", + "description": "Redis streaming demo using Predis (PHP).", + "require": { + "php": ">=8.1", + "predis/predis": "^3.0" + } +} diff --git a/content/develop/use-cases/streaming/php/demo_server.php b/content/develop/use-cases/streaming/php/demo_server.php new file mode 100644 index 0000000000..561959fc13 --- /dev/null +++ b/content/develop/use-cases/streaming/php/demo_server.php @@ -0,0 +1,1200 @@ + ['worker-a', 'worker-b'], + 'analytics' => ['worker-c'], +]; + +const EVENT_TYPES = [ + 'order.placed', + 'order.paid', + 'order.shipped', + 'order.cancelled', +]; + +const SEED_FLAG_KEY = 'demo:streaming:seeded'; +const WORKER_REGISTRY_KEY = 'demo:streaming:workers'; + +// --------------------------------------------------------------------- +// Connect +// --------------------------------------------------------------------- + +try { + $redis = new PredisClient([ + 'host' => $redisHost, + 'port' => $redisPort, + ]); + $redis->ping(); +} catch (\Throwable $exc) { + http_response_code(500); + header('Content-Type: text/plain'); + echo "Failed to connect to Redis at {$redisHost}:{$redisPort}: " . $exc->getMessage(); + return; +} + +$stream = new EventStream($redis, $streamKey, $maxlen, $claimIdleMs); + +// First-request bootstrap: reset (unless NO_RESET=1) and seed the +// default groups + workers. We use a Redis-side flag so subsequent +// requests don't re-run this every time. +if (!$redis->exists(SEED_FLAG_KEY)) { + if ($resetOnStart) { + reset_demo($stream, $redis, $processLatencyMs); + } else { + seed_groups_and_workers($stream, $redis, $processLatencyMs); + } + $redis->set(SEED_FLAG_KEY, '1'); +} + +// --------------------------------------------------------------------- +// Routing +// --------------------------------------------------------------------- + +$method = $_SERVER['REQUEST_METHOD']; +$path = parse_url($_SERVER['REQUEST_URI'] ?? '/', PHP_URL_PATH) ?: '/'; + +if ($method === 'GET' && ($path === '/' || $path === '/index.html')) { + send_html(render_page($stream)); + return; +} +if ($method === 'GET' && $path === '/state') { + send_json(build_state($stream, $redis)); + return; +} +if ($method === 'GET' && $path === '/replay') { + handle_replay($stream); + return; +} +if ($method === 'POST' && $path === '/produce') { + handle_produce($stream); + return; +} +if ($method === 'POST' && $path === '/add-worker') { + handle_add_worker($stream, $redis, $processLatencyMs); + return; +} +if ($method === 'POST' && $path === '/remove-worker') { + handle_remove_worker($stream, $redis); + return; +} +if ($method === 'POST' && $path === '/crash') { + handle_crash($stream); + return; +} +if ($method === 'POST' && $path === '/autoclaim') { + handle_autoclaim($stream); + return; +} +if ($method === 'POST' && $path === '/trim') { + handle_trim($stream); + return; +} +if ($method === 'POST' && $path === '/reset') { + $count = reset_demo($stream, $redis, $processLatencyMs); + send_json(['consumers' => $count]); + return; +} + +http_response_code(404); +echo 'Not Found'; + +// ===================================================================== +// Helpers +// ===================================================================== + +function reset_demo(EventStream $stream, $redis, int $processLatencyMs): int +{ + // Stop every known worker, drop the stream, zero counters, re-seed. + $registered = $redis->smembers(WORKER_REGISTRY_KEY); + foreach ((array) $registered as $entry) { + [$g, $n] = parse_worker_key((string) $entry); + if ($g !== '' && $n !== '') { + kill_worker($redis, $g, $n); + ConsumerWorker::deleteWorkerState($redis, $g, $n); + } + } + $redis->del([WORKER_REGISTRY_KEY]); + $stream->deleteStream(); + $stream->resetStats(); + + return seed_groups_and_workers($stream, $redis, $processLatencyMs); +} + +function seed_groups_and_workers(EventStream $stream, $redis, int $processLatencyMs): int +{ + $count = 0; + foreach (DEFAULT_GROUPS as $group => $names) { + $stream->ensureGroup($group, '0-0'); + foreach ($names as $name) { + spawn_worker($stream, $redis, $group, $name, $processLatencyMs); + $count++; + } + } + return $count; +} + +function spawn_worker(EventStream $stream, $redis, string $group, string $name, int $processLatencyMs): bool +{ + if ($redis->sismember(WORKER_REGISTRY_KEY, worker_key($group, $name))) { + // Already registered. Verify it's alive; if not, clear and respawn. + $existing = (int) $redis->get(EventStream::workerKey($group, $name, 'pid')); + if ($existing > 0 && ConsumerWorker::isAlive($existing)) { + return false; + } + $redis->srem(WORKER_REGISTRY_KEY, [worker_key($group, $name)]); + ConsumerWorker::deleteWorkerState($redis, $group, $name); + } + + $workerScript = __DIR__ . '/ConsumerWorker.php'; + $phpBinary = PHP_BINARY ?: 'php'; + $args = [ + $phpBinary, + $workerScript, + '--group', $group, + '--name', $name, + '--redis-host', getenv('REDIS_HOST') ?: '127.0.0.1', + '--redis-port', (string) (getenv('REDIS_PORT') ?: 6379), + '--stream-key', $stream->streamKey(), + '--maxlen', (string) $stream->maxlenApprox(), + '--claim-idle-ms', (string) $stream->claimMinIdleMs(), + '--process-latency-ms', (string) $processLatencyMs, + ]; + + // The dev server's listen socket leaks into any child we don't + // detach from. The shell wrapper pattern below backgrounds the + // worker (`&`) and echoes its real PID via `$!`, redirects stdio + // away from any inherited socket, and (on Linux) prepends + // `setsid` so the worker becomes its own session leader. On + // macOS, `setsid` isn't shipped — the wrapper alone is enough. + // + // We deliberately do NOT use `proc_open(['setsid','-f',...])` + // here: setsid -f forks once, leaving the now-dead wrapper as + // the captured PID and the actual worker disconnected. Capturing + // `$!` from a shell that owns the background job is the only + // reliable way to get the worker's own PID across both platforms. + // Use /tmp directly rather than sys_get_temp_dir() so the log + // path is predictable on macOS (where sys_get_temp_dir returns a + // per-user /var/folders/... path) and easy to inspect. + $logFile = '/tmp/streaming-php-worker-' . preg_replace('/[^A-Za-z0-9_-]/', '_', $group . '-' . $name) . '.log'; + $escaped = array_map('escapeshellarg', $args); + $workerCmd = implode(' ', $escaped); + $prefix = (PHP_OS_FAMILY === 'Darwin') ? '' : 'setsid '; + $shellCmd = sprintf( + '%s%s >>%s 2>&1 ['file', '/dev/null', 'r'], + 1 => ['pipe', 'w'], + 2 => ['file', $logFile, 'a'], + ]; + $proc = proc_open($procArgs, $descriptorSpec, $pipes); + if (!is_resource($proc)) { + return false; + } + $pidLine = trim((string) fgets($pipes[1])); + foreach ($pipes as $p) { + if (is_resource($p)) { + fclose($p); + } + } + proc_close($proc); + $pid = (int) $pidLine; + if ($pid <= 0) { + return false; + } + + $redis->set(EventStream::workerKey($group, $name, 'pid'), (string) $pid); + $redis->sadd(WORKER_REGISTRY_KEY, [worker_key($group, $name)]); + return true; +} + +function kill_worker($redis, string $group, string $name): bool +{ + $pid = (int) $redis->get(EventStream::workerKey($group, $name, 'pid')); + if ($pid <= 0 || !function_exists('posix_kill')) { + return false; + } + @posix_kill($pid, defined('SIGTERM') ? SIGTERM : 15); + // Give the worker a chance to leave the loop cleanly (its + // XREADGROUP block is up to 500 ms). + for ($i = 0; $i < 12; $i++) { + if (!ConsumerWorker::isAlive($pid)) { + return true; + } + usleep(60 * 1000); + } + @posix_kill($pid, defined('SIGKILL') ? SIGKILL : 9); + return true; +} + +/** + * Pause every worker in the named group (except the optional opt-out) + * and wait for each to acknowledge `idle=1`. + * + * Returns ['touched' => list<[group, name]> of workers we asked to + * pause, 'failed' => list<[group, name]> of workers that did not + * acknowledge before their per-worker deadline]. **The caller MUST + * check `failed` and act accordingly** — a worker that never paused + * may still be reading new entries onto its PEL, so any subsequent + * destructive operation (handover, DELCONSUMER, reset) can race the + * worker and lose work. A non-empty `failed` is a hard error for + * remove-worker; a soft warning for autoclaim/reset. + * + * Each worker gets its own deadline (rather than sharing one across + * the loop) so a slow first worker doesn't eat the budget for the + * rest. + */ +function pause_workers_in_group($redis, string $group, ?string $exceptName = null, float $timeoutSec = 1.5): array +{ + $registered = $redis->smembers(WORKER_REGISTRY_KEY); + $touched = []; + foreach ((array) $registered as $entry) { + [$g, $n] = parse_worker_key((string) $entry); + if ($g !== $group) { + continue; + } + if ($exceptName !== null && $n === $exceptName) { + continue; + } + $redis->set(EventStream::workerKey($g, $n, 'paused'), '1'); + $touched[] = [$g, $n]; + } + // Wait for each touched worker to acknowledge it's idle. Each + // worker gets its own deadline; the previous implementation shared + // one deadline across the loop so a slow first worker could leave + // zero budget for the rest, and the silent-fallthrough exit on + // deadline expiry left the caller thinking the pause succeeded. + $failed = []; + foreach ($touched as [$g, $n]) { + $deadline = microtime(true) + $timeoutSec; + $acked = false; + while (microtime(true) < $deadline) { + if ((string) $redis->get(EventStream::workerKey($g, $n, 'idle')) === '1') { + $acked = true; + break; + } + usleep(20 * 1000); + } + if (!$acked) { + $failed[] = [$g, $n]; + } + } + return ['touched' => $touched, 'failed' => $failed]; +} + +function resume_workers($redis, array $workers): void +{ + foreach ($workers as [$g, $n]) { + $redis->del([ + EventStream::workerKey($g, $n, 'paused'), + EventStream::workerKey($g, $n, 'idle'), + ]); + } +} + +function worker_key(string $group, string $name): string +{ + return $group . '/' . $name; +} + +function parse_worker_key(string $key): array +{ + $pos = strpos($key, '/'); + if ($pos === false) { + return ['', '']; + } + return [substr($key, 0, $pos), substr($key, $pos + 1)]; +} + +function list_workers($redis): array +{ + $registered = $redis->smembers(WORKER_REGISTRY_KEY); + $out = []; + foreach ((array) $registered as $entry) { + [$g, $n] = parse_worker_key((string) $entry); + if ($g !== '' && $n !== '') { + $out[] = [$g, $n]; + } + } + // Sort by group then name for stable rendering. + usort($out, function ($a, $b) { + $c = strcmp($a[0], $b[0]); + return $c !== 0 ? $c : strcmp($a[1], $b[1]); + }); + return $out; +} + +// --------------------------------------------------------------------- +// HTTP handlers +// --------------------------------------------------------------------- + +function handle_produce(EventStream $stream): void +{ + $params = read_form_data(); + $count = max(1, min(500, (int) ($params['count'] ?? 1))); + $type = trim((string) ($params['type'] ?? '')); + + $events = []; + for ($i = 0; $i < $count; $i++) { + $picked = $type !== '' ? $type : EVENT_TYPES[array_rand(EVENT_TYPES)]; + $events[] = [$picked, fake_payload()]; + } + $ids = $stream->produceBatch($events); + send_json(['produced' => count($ids), 'ids' => $ids]); +} + +function handle_add_worker(EventStream $stream, $redis, int $processLatencyMs): void +{ + $params = read_form_data(); + $group = trim((string) ($params['group'] ?? '')); + $name = trim((string) ($params['name'] ?? '')); + if ($group === '' || $name === '') { + send_json(['error' => 'group and name are required'], 400); + return; + } + $stream->ensureGroup($group, '0-0'); + $added = spawn_worker($stream, $redis, $group, $name, $processLatencyMs); + if (!$added) { + send_json(['error' => "{$group}/{$name} already exists"], 409); + return; + } + send_json(['group' => $group, 'name' => $name]); +} + +function handle_remove_worker(EventStream $stream, $redis): void +{ + $params = read_form_data(); + $group = trim((string) ($params['group'] ?? '')); + $name = trim((string) ($params['name'] ?? '')); + if ($group === '' || $name === '') { + send_json(['error' => 'group and name are required'], 400); + return; + } + if (!$redis->sismember(WORKER_REGISTRY_KEY, worker_key($group, $name))) { + send_json(['removed' => false, 'reason' => 'not-found'], 200); + return; + } + // Find a peer in the same group to hand the PEL over to. + $peer = null; + foreach (list_workers($redis) as [$g, $n]) { + if ($g === $group && $n !== $name) { + $peer = $n; + break; + } + } + if ($peer === null) { + send_json([ + 'removed' => false, + 'reason' => 'no-peer', + 'message' => "{$group}/{$name} still owns pending entries and is the only consumer in its group; add another consumer first so its PEL can be handed over before deletion.", + ], 409); + return; + } + + // Pause the source worker (so XREADGROUP > can't pull any more + // entries onto its PEL mid-handover), hand its PEL off, then + // stop it, then XGROUP DELCONSUMER. The peer is left running so + // it can keep processing. + // + // If the source worker doesn't acknowledge pause in time, abort + // the removal — DELCONSUMER would destroy any entries the worker + // pulled onto its PEL after we stopped polling but before it + // observed the pause flag. + $pauseResult = pause_workers_in_group($redis, $group, /*exceptName*/ $peer); + if (!empty($pauseResult['failed'])) { + // Resume whatever we did pause so a retry can succeed. + $resumeList = $pauseResult['touched']; + $resumeList[] = [$group, $name]; + resume_workers($redis, $resumeList); + send_json([ + 'removed' => false, + 'reason' => 'pause-failed', + 'message' => "{$group}/{$name} did not acknowledge the pre-handover pause within the deadline. The worker may still be pulling entries from XREADGROUP > onto its PEL; aborting the remove so XGROUP DELCONSUMER does not destroy in-flight work. Investigate the worker process and retry.", + ], 409); + return; + } + + // Run the handover. Any exception from XPENDING / XCLAIM must + // abort the removal — XGROUP DELCONSUMER would destroy the + // source's pending list. + try { + $claimedCount = $stream->handoverPending($group, $name, $peer); + } catch (\Throwable $e) { + // Resume the workers we paused so the demo isn't stuck. + $resumeList = $pauseResult['touched']; + $resumeList[] = [$group, $name]; + resume_workers($redis, $resumeList); + send_json([ + 'removed' => false, + 'reason' => 'handover-failed', + 'message' => "Handover from {$group}/{$name} to {$peer} failed before XGROUP DELCONSUMER could run: " . $e->getMessage() . ". {$group}/{$name} is still in the group; retry the remove or investigate the Redis error before deleting (DELCONSUMER would destroy the source consumer's pending entries).", + ], 409); + return; + } + + kill_worker($redis, $group, $name); + $stream->deleteConsumer($group, $name); + $redis->srem(WORKER_REGISTRY_KEY, [worker_key($group, $name)]); + ConsumerWorker::deleteWorkerState($redis, $group, $name); + + resume_workers($redis, [[$group, $name]]); // best-effort cleanup + // Resume any other paused workers in the group too. + foreach (list_workers($redis) as [$g, $n]) { + if ($g === $group) { + $redis->del([ + EventStream::workerKey($g, $n, 'paused'), + EventStream::workerKey($g, $n, 'idle'), + ]); + } + } + + send_json([ + 'removed' => true, + 'handed_over_to' => $peer, + 'handed_over_count' => $claimedCount, + ]); +} + +function handle_crash(EventStream $stream): void +{ + $params = read_form_data(); + $group = trim((string) ($params['group'] ?? '')); + $name = trim((string) ($params['name'] ?? '')); + $count = (int) ($params['count'] ?? 1); + if ($group === '' || $name === '') { + send_json(['error' => 'group and name are required'], 400); + return; + } + $registry = $stream->client(); + if (!$registry->sismember(WORKER_REGISTRY_KEY, worker_key($group, $name))) { + send_json(['error' => "unknown consumer {$group}/{$name}"], 404); + return; + } + $worker = new ConsumerWorker($stream, $group, $name); + $worker->crashNext($count); + send_json(['queued' => $count]); +} + +function handle_autoclaim(EventStream $stream): void +{ + $params = read_form_data(); + $group = trim((string) ($params['group'] ?? '')); + $consumer = trim((string) ($params['consumer'] ?? '')); + if ($group === '' || $consumer === '') { + send_json(['error' => 'group and consumer are required'], 400); + return; + } + $redis = $stream->client(); + if (!$redis->sismember(WORKER_REGISTRY_KEY, worker_key($group, $consumer))) { + send_json(['error' => "unknown consumer {$group}/{$consumer}"], 404); + return; + } + // The worker process is the one normally responsible for its + // reaps; but the demo's "XAUTOCLAIM to selected" button needs + // the action to be visible to the user in one request cycle. + // Pause the target so its in-process loop can't process the same + // entries simultaneously and credit them to `processed` instead + // of `reaped`, then run the reap here, then resume. + $redis->set(EventStream::workerKey($group, $consumer, 'paused'), '1'); + $deadline = microtime(true) + 1.0; + while (microtime(true) < $deadline) { + if ((string) $redis->get(EventStream::workerKey($group, $consumer, 'idle')) === '1') { + break; + } + usleep(20 * 1000); + } + $worker = new ConsumerWorker($stream, $group, $consumer); + $result = $worker->reapIdlePel(); + $redis->del([ + EventStream::workerKey($group, $consumer, 'paused'), + EventStream::workerKey($group, $consumer, 'idle'), + ]); + send_json([ + 'claimed' => $result['claimed'], + 'processed' => $result['processed'], + 'deleted' => $result['deleted_ids'], + 'min_idle_ms' => $stream->claimMinIdleMs(), + ]); +} + +function handle_trim(EventStream $stream): void +{ + $params = read_form_data(); + $maxlen = (int) ($params['maxlen'] ?? 0); + $deleted = $stream->trimMaxlen($maxlen); + send_json(['deleted' => $deleted, 'maxlen' => $maxlen]); +} + +function handle_replay(EventStream $stream): void +{ + $query = $_GET ?? []; + $start = (string) ($query['start'] ?? '-'); + if ($start === '') { + $start = '-'; + } + $end = (string) ($query['end'] ?? '+'); + if ($end === '') { + $end = '+'; + } + $limit = max(1, min(500, (int) ($query['count'] ?? 20))); + $entries = $stream->replay($start, $end, $limit); + $rows = []; + foreach ($entries as [$id, $fields]) { + $rows[] = ['id' => $id, 'fields' => $fields]; + } + send_json([ + 'start' => $start, + 'end' => $end, + 'limit' => $limit, + 'entries' => $rows, + ]); +} + +// --------------------------------------------------------------------- +// State assembly +// --------------------------------------------------------------------- + +function build_state(EventStream $stream, $redis): array +{ + $streamInfo = $stream->infoStream(); + $groups = $stream->infoGroups(); + + $registered = list_workers($redis); + $byGroup = []; + foreach ($registered as [$g, $n]) { + $byGroup[$g][] = $n; + } + + $groupsDetail = []; + $pendingRows = []; + foreach ($groups as $group) { + $groupName = (string) $group['name']; + $consumerInfoRaw = $stream->infoConsumers($groupName); + $consumerInfo = []; + foreach ($consumerInfoRaw as $row) { + $consumerInfo[(string) $row['name']] = $row; + } + + $consumersDetail = []; + $seen = []; + + $owned = $byGroup[$groupName] ?? []; + foreach ($owned as $consumerName) { + $worker = new ConsumerWorker($stream, $groupName, $consumerName); + $status = $worker->status(); + $info = $consumerInfo[$consumerName] ?? ['pending' => 0, 'idle_ms' => 0]; + $consumersDetail[] = array_merge( + $status, + [ + 'pending' => (int) ($info['pending'] ?? 0), + 'idle_ms' => (int) ($info['idle_ms'] ?? 0), + 'recent' => $worker->recent(), + ] + ); + $seen[$consumerName] = true; + } + // Also include consumers that exist in Redis but not in our + // registry (e.g. orphaned after a restart). + foreach ($consumerInfo as $cName => $info) { + if (isset($seen[$cName])) { + continue; + } + $consumersDetail[] = [ + 'name' => $cName, + 'group' => $groupName, + 'processed' => 0, + 'reaped' => 0, + 'crashed_drops' => 0, + 'paused' => false, + 'crash_queued' => 0, + 'alive' => false, + 'pending' => (int) ($info['pending'] ?? 0), + 'idle_ms' => (int) ($info['idle_ms'] ?? 0), + 'recent' => [], + ]; + } + usort($consumersDetail, function ($a, $b) { return strcmp((string) $a['name'], (string) $b['name']); }); + $group['consumers_detail'] = $consumersDetail; + $groupsDetail[] = $group; + + foreach ($stream->pendingDetail($groupName, 50) as $row) { + $row['group'] = $groupName; + $pendingRows[] = $row; + } + } + + $tailRaw = $stream->tail(10); + $tail = []; + foreach ($tailRaw as [$id, $fields]) { + $tail[] = ['id' => $id, 'fields' => $fields]; + } + + return [ + 'stream' => $streamInfo, + 'tail' => $tail, + 'groups' => $groupsDetail, + 'pending' => $pendingRows, + 'stats' => $stream->stats(), + ]; +} + +function fake_payload(): array +{ + $customers = ['alice', 'bob', 'carol', 'dan', 'erin']; + return [ + 'order_id' => 'o-' . random_int(1000, 9999), + 'customer' => $customers[array_rand($customers)], + 'amount' => sprintf('%.2f', mt_rand(500, 25000) / 100), + ]; +} + +// --------------------------------------------------------------------- +// HTTP plumbing +// --------------------------------------------------------------------- + +function read_form_data(): array +{ + $raw = file_get_contents('php://input') ?: ''; + $parsed = []; + parse_str($raw, $parsed); + return $parsed; +} + +function send_html(string $html, int $status = 200): void +{ + http_response_code($status); + header('Content-Type: text/html; charset=utf-8'); + echo $html; +} + +function send_json($payload, int $status = 200): void +{ + http_response_code($status); + header('Content-Type: application/json'); + echo json_encode($payload, JSON_UNESCAPED_SLASHES); +} + +function render_page(EventStream $stream): string +{ + $streamKey = htmlspecialchars($stream->streamKey(), ENT_QUOTES, 'UTF-8'); + $maxlen = (string) $stream->maxlenApprox(); + $claimIdle = (string) $stream->claimMinIdleMs(); + $html = <<<'HTML' + + + + + + Redis Streaming Demo + + + +
+
Predis + PHP built-in dev server
+

Redis Streaming Demo

+

+ Producers append events to a single Redis Stream + (__STREAM_KEY__). Two consumer groups read the same + stream independently: notifications shares its work + across two consumers, analytics processes the full + flow on its own. Acknowledge with XACK, recover + crashed deliveries with XAUTOCLAIM, replay any range + with XRANGE, and bound retention with XTRIM. +

+ +
+
+

Stream state

+
Loading...
+ + +
+ +
+

Produce events

+

Events are appended with XADD with an approximate + MAXLEN ~ __MAXLEN__ retention cap.

+ + + + + +
+ +
+

Replay range (XRANGE)

+

Reads a slice of history. Replay is independent of any + consumer group — no cursors move, no acks happen.

+ + + + + + + +
+ +
+

Trim retention (XTRIM)

+

Cap the stream length. Approximate trimming releases whole + macro-nodes, which is much cheaper than exact trimming.

+ + + +
+ +
+

Consumer groups

+
Loading...
+
+ +
+

Pending entries (XPENDING)

+

Entries delivered to a consumer that haven't been acked yet. + Idle time ≥ __CLAIM_IDLE__ ms is eligible for + XAUTOCLAIM.

+
Loading...
+
+ + +
+
+ +
+

Last result

+

Produce events, replay a range, or trigger an autoclaim to see results.

+
+
+ +
+
+ + + + +HTML; + return strtr($html, [ + '__STREAM_KEY__' => $streamKey, + '__MAXLEN__' => $maxlen, + '__CLAIM_IDLE__' => $claimIdle, + ]); +} diff --git a/content/develop/use-cases/streaming/redis-py/demo_server.py b/content/develop/use-cases/streaming/redis-py/demo_server.py index 32682c655e..acd9eae88d 100644 --- a/content/develop/use-cases/streaming/redis-py/demo_server.py +++ b/content/develop/use-cases/streaming/redis-py/demo_server.py @@ -474,7 +474,7 @@ if (!confirm("Drop the stream and re-seed the default groups?")) return; const r = await fetch("/reset", { method: "POST" }); const d = await r.json(); - setStatus(`Reset. ${d.groups} group(s) re-seeded.`, "ok"); + setStatus(`Reset. ${d.consumers} consumer(s) re-seeded.`, "ok"); await refresh(); }); @@ -665,7 +665,7 @@ def do_POST(self) -> None: return if parsed.path == "/reset": count = self.demo.reset() - self._send_json({"groups": count}, 200) + self._send_json({"consumers": count}, 200) return self.send_error(404) diff --git a/content/develop/use-cases/streaming/ruby/_index.md b/content/develop/use-cases/streaming/ruby/_index.md new file mode 100644 index 0000000000..7661276991 --- /dev/null +++ b/content/develop/use-cases/streaming/ruby/_index.md @@ -0,0 +1,461 @@ +--- +categories: +- docs +- develop +- stack +- oss +- rs +- rc +description: Implement a Redis event-streaming pipeline in Ruby with redis-rb +linkTitle: redis-rb example (Ruby) +title: Redis streaming with redis-rb +weight: 8 +--- + +This guide shows you how to build a Redis-backed event-streaming pipeline in Ruby with the [`redis-rb`]({{< relref "/develop/clients/ruby" >}}) gem. It includes a small local web server built on Ruby's standard `webrick` library so you can produce events into a single Redis Stream, watch two independent consumer groups read it at their own pace, and recover stuck deliveries with `XAUTOCLAIM` after simulating a consumer crash. + +## Overview + +A Redis Stream is an append-only log of field/value entries with auto-generated, time-ordered IDs. Producers append with [`XADD`]({{< relref "/commands/xadd" >}}); consumers belong to *consumer groups* and read with [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}). The group as a whole tracks a single `last-delivered-id` cursor, and each consumer gets its own pending-entries list (PEL) of messages it has been handed but not yet acknowledged. Once a consumer has processed an entry it calls [`XACK`]({{< relref "/commands/xack" >}}) to clear the entry from its PEL; entries left unacknowledged past an idle threshold can be reassigned to a healthy consumer with [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}). + +That gives you: + +* Ordered, durable history that many independent consumer groups can read at their own pace +* At-least-once delivery, with per-consumer pending lists and automatic recovery of crashed consumers +* Horizontal scaling within a group — add a consumer and Redis automatically splits the work +* Replay of any range with [`XRANGE`]({{< relref "/commands/xrange" >}}), independent of consumer-group state +* Bounded retention through [`XADD MAXLEN ~`]({{< relref "/commands/xadd" >}}) or + [`XTRIM MINID ~`]({{< relref "/commands/xtrim" >}}), without a separate cleanup job + +In this example, producers append order events (`order.placed`, `order.paid`, `order.shipped`, `order.cancelled`) to a single stream at `demo:events:orders`. Two consumer groups read the same stream: + +* **`notifications`** — two consumers (`worker-a`, `worker-b`) sharing the work, modelling a fan-out worker pool. +* **`analytics`** — one consumer (`worker-c`) processing the full event flow on its own. + +## How it works + +The flow looks like this: + +1. The application calls `stream.produce(event_type, payload)` which runs [`XADD`]({{< relref "/commands/xadd" >}}) with an approximate [`MAXLEN ~`]({{< relref "/commands/xadd" >}}) cap. Redis assigns an auto-generated time-ordered ID. +2. Each consumer thread loops on [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}) with the special ID `>` (meaning "deliver entries this group has not yet delivered to anyone") and a short block timeout. +3. After processing each entry, the consumer calls [`XACK`]({{< relref "/commands/xack" >}}) so Redis can drop it from the group's pending list. +4. If a consumer is killed (or crashes) before acking, its entries sit in the group's PEL. A periodic [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}) sweep reassigns idle entries to a healthy consumer. +5. Anyone — including code outside the consumer groups — can read history with [`XRANGE`]({{< relref "/commands/xrange" >}}) without affecting any group's cursor. + +Each consumer group has its own cursor (`last-delivered-id`) and its own pending list, so the two groups in this demo process the same events without coordinating with each other. + +## The event-stream helper + +The `RedisEventStream` class wraps the stream operations +([source](https://github.com/redis/docs/blob/main/content/develop/use-cases/streaming/ruby/event_stream.rb)): + +```ruby +require 'redis' +require_relative 'event_stream' + +redis = Redis.new(host: 'localhost', port: 6379) +stream = RedisEventStream.new( + redis: redis, + stream_key: 'demo:events:orders', + maxlen_approx: 2000, # retention guardrail + claim_min_idle_ms: 5000, # XAUTOCLAIM threshold +) + +# Producer +stream_id = stream.produce( + 'order.placed', + { 'order_id' => 'o-1234', 'customer' => 'alice', 'amount' => '49.50' }, +) + +# Consumer group + one consumer +stream.ensure_group('notifications', '0-0') +entries = stream.consume('notifications', 'worker-a', count: 10, block_ms: 500) +entries.each do |entry_id, fields| + handle(fields) # your processing + stream.ack('notifications', [entry_id]) # XACK +end + +# Recover stuck PEL entries by reaping them into a healthy consumer. +# The textbook pattern: each consumer periodically calls XAUTOCLAIM +# with itself as the target and processes whatever it claimed. +# `ConsumerWorker#reap_idle_pel` wraps that flow; the low-level helper +# `stream.autoclaim(group, target_name)` is also available if you +# want to drive XAUTOCLAIM directly. +result = worker_b.reap_idle_pel +# result == { claimed: N, processed: M, deleted_ids: [...] } +# deleted_ids are PEL entries whose payload was already trimmed. +# Redis 7+ has already removed those slots from the PEL, so no XACK +# is needed -- log them and route to a dead-letter store for audit. + +# Replay history (independent of any group's cursor) +stream.replay(start_id: '-', end_id: '+', count: 50).each do |entry_id, fields| + puts "#{entry_id} #{fields.inspect}" +end +``` + +### Data model + +Each event is a single stream entry — a flat hash of field/value strings — with an auto-generated time-ordered ID: + +```text +demo:events:orders + 1716998413541-0 type=order.placed order_id=o-1234 customer=alice amount=49.50 ts_ms=... + 1716998413542-0 type=order.paid order_id=o-1234 customer=alice amount=49.50 ts_ms=... + 1716998413542-1 type=order.shipped order_id=o-1235 customer=bob amount=12.00 ts_ms=... + ... +``` + +The ID is `{milliseconds}-{sequence}`, monotonically increasing within the stream, so you can range-query by approximate wall-clock time without an extra index. (IDs are ordered within a stream, not across streams — two events appended to different streams at the same millisecond can produce the same ID.) The implementation uses: + +* [`XADD ... MAXLEN ~ n`]({{< relref "/commands/xadd" >}}), pipelined, for batch production with a retention cap +* [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}) with the special ID `>` for fresh deliveries to a consumer +* [`XACK`]({{< relref "/commands/xack" >}}) on every processed entry +* [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}) for sweeping idle pending entries to a healthy consumer +* [`XRANGE`]({{< relref "/commands/xrange" >}}) for replay and audit +* [`XPENDING`]({{< relref "/commands/xpending" >}}) for inspecting the per-group pending list +* [`XINFO STREAM`]({{< relref "/commands/xinfo-stream" >}}), + [`XINFO GROUPS`]({{< relref "/commands/xinfo-groups" >}}), and + [`XINFO CONSUMERS`]({{< relref "/commands/xinfo-consumers" >}}) for surface-level observability +* [`XTRIM`]({{< relref "/commands/xtrim" >}}) for explicit retention enforcement + +## Producing events + +`produce_batch` pipelines `XADD` calls in a single round trip. Each call carries an approximate `MAXLEN ~` cap so the stream stays bounded as it rolls forward: + +```ruby +def produce_batch(events) + events = events.to_a + return [] if events.empty? + ids = @redis.pipelined do |pipe| + events.each do |event_type, payload| + fields = encode_fields(event_type, payload) + pipe.xadd(@stream_key, fields, + maxlen: @maxlen_approx, approximate: true) + end + end + @stats_lock.synchronize { @produced_total += ids.length } + ids +end +``` + +The `~` flavour of `MAXLEN` lets Redis trim at a macro-node boundary, which is much cheaper than exact trimming and is what you want when the cap is a retention *guardrail*, not a hard size constraint. With 300 events produced and `MAXLEN ~ 50`, you might end up with 100 entries left — Redis released the oldest whole macro-node and stopped. The next `XADD` will keep length stable. + +If you genuinely need an exact cap (rare), drop `approximate: true`. The performance difference is significant on busy streams. + +## Reading with a consumer group + +Each consumer in a group runs the same `XREADGROUP` loop. The special ID `>` means "deliver entries this group has not yet delivered to *anyone*": + +```ruby +def consume(group, consumer, count: 10, block_ms: 500) + result = @redis.xreadgroup(group, consumer, @stream_key, '>', + count: count, block: block_ms) + flatten_entries(result) +end +``` + +`block_ms` makes the call efficient even when the stream is idle: the client parks on the server until either an entry arrives or the timeout expires, so consumers don't busy-loop. + +Reading with an explicit ID like `0-0` instead of `>` does something different — it replays entries already delivered to *this* consumer name (its private PEL). That is the canonical recovery path when the same consumer restarts: catch up on its own pending entries first, then resume reading new ones. + +A `redis-rb` `Redis` instance holds a single TCP connection and serialises every command through a monitor; a blocking `XREADGROUP BLOCK 500` parks that monitor for up to 500 ms, so every consumer must use its *own* `Redis` connection or its blocking read will park every other thread sharing the same client. The demo gives each `ConsumerWorker` a dedicated read connection and shares one connection across the HTTP handlers (which only issue non-blocking commands). + +## Acknowledging entries + +Once the consumer has processed an entry, `XACK` tells Redis it can drop the entry from the group's pending list: + +```ruby +def ack(group, ids) + ids = Array(ids) + return 0 if ids.empty? + n = @redis.xack(@stream_key, group, ids).to_i + @stats_lock.synchronize { @acked_total += n } + n +end +``` + +This is the linchpin of at-least-once delivery: an entry that is never acked stays in the PEL until a claim moves it elsewhere. If your consumer thread crashes between processing and ack, the next claim sweep picks the entry back up. The one caveat is retention: `XADD MAXLEN ~` and `XTRIM` can release the entry's *payload* even while its ID is still in the PEL. The next `XAUTOCLAIM` returns those IDs in its `deleted` list and removes them from the PEL inside the same command — the entry cannot be retried, so the caller should log it and route to a dead-letter store for audit. The example handles this explicitly in `reap_idle_pel` further down. + +The trade-off is the opposite of pub/sub: a slow or crashed consumer doesn't lose messages, but it does mean your downstream system must be idempotent. If you process an order twice because the first attempt died after the side effect but before the ack, the second attempt must be safe. + +## Multiple consumer groups, one stream + +The big difference between Redis Streams and a job queue is that any number of independent consumer groups can read the same stream. The demo sets up two groups on `demo:events:orders`: + +```ruby +stream.ensure_group('notifications', '0-0') +stream.ensure_group('analytics', '0-0') +``` + +Each group has its own cursor. Producing 5 events results in `notifications` and `analytics` each receiving all 5, with no coordination between them. Within `notifications`, the work is split across `worker-a` and `worker-b`: Redis hands each `XREADGROUP` call whatever entries are not yet delivered to anyone in the group, so adding a second worker doubles throughput without any rebalance logic. + +The `'0-0'` start ID means "deliver everything in the stream from the beginning" — useful in a demo and for fresh groups bootstrapping from history. In production, a brand-new group reading a long-existing stream usually starts at `'$'` ("only events after this point") and uses [`XRANGE`]({{< relref "/commands/xrange" >}}) explicitly if it needs history. + +## Recovering crashed consumers with XAUTOCLAIM + +The demo's "Crash next 3" button tells a chosen consumer to drop its next three deliveries on the floor without acking them — the same effect as a worker process dying mid-message. Those entries stay in the group's PEL with their delivery counter incremented. Once they have been idle for at least `claim_min_idle_ms`, any healthy consumer in the group can rescue them by calling `XAUTOCLAIM` *with itself as the target*. `ConsumerWorker#reap_idle_pel` wraps that pattern: + +```ruby +def reap_idle_pel + result = @shared_stream.autoclaim(@group, @name, + page_count: 100, max_pages: 10) + claimed = result[:claimed] + deleted = result[:deleted_ids] + processed = 0 + claimed.each do |entry_id, fields| + begin + handle_entry(entry_id, fields) + processed += 1 + rescue StandardError => exc + warn "[#{@group}/#{@name}] reap failed on #{entry_id}: #{exc}" + end + end + @lock.synchronize { @reaped += processed } + { claimed: claimed.length, processed: processed, deleted_ids: deleted } +end +``` + +The underlying `stream.autoclaim` helper pages through the group's PEL with `XAUTOCLAIM`'s continuation cursor. `redis-rb` 5.x exposes a typed `xautoclaim(...)` wrapper, but it discards the third element of the reply (the list of IDs whose payload was already trimmed). The helper drops down to the raw command via `redis.call(...)` so the deleted-IDs list survives: + +```ruby +def autoclaim(group, consumer, page_count: 100, start_id: '0-0', + max_pages: 10) + claimed_all = [] + deleted_all = [] + cursor = start_id + max_pages.times do + reply = @redis.call('XAUTOCLAIM', @stream_key, group, consumer, + @claim_min_idle_ms.to_s, cursor, + 'COUNT', page_count.to_s) + next_cursor = reply[0] + claimed = parse_entries(reply[1]) + deleted = Array(reply[2]) + claimed_all.concat(claimed) + deleted_all.concat(deleted) + break if next_cursor == '0-0' + cursor = next_cursor + end + @stats_lock.synchronize { @claimed_total += claimed_all.length } + { claimed: claimed_all, deleted_ids: deleted_all } +end +``` + +A single `XAUTOCLAIM` call scans up to `page_count` PEL entries starting at `start_id`, reassigns the ones idle for at least `min_idle_time` to the named consumer, and returns a continuation cursor in the first slot of the reply. For a full sweep, loop until the cursor returns to `'0-0'` (with a `max_pages` safety net so one call cannot monopolise a very large PEL). The delivery counter is incremented on every claim — after a few cycles you can use it to spot a *poison-pill* message that crashes every consumer that touches it, and route it to a dead-letter stream so the bad entry stops cycling. (New entries keep flowing past the poison pill — `XREADGROUP >` still delivers fresh work — but the bad entry's repeated reclaim wastes consumer time and keeps the PEL larger than it needs to be.) + +The `deleted_ids` list contains PEL entry IDs whose stream payload was already trimmed by the time the claim ran (typically because `MAXLEN ~` retention outran a slow consumer). `XAUTOCLAIM` removes those dangling slots from the PEL itself, so the caller does *not* need to `XACK` them — but the entries cannot be retried either, so log and route them to a dead-letter store for offline inspection. Redis 7.0 introduced this third return element; the example requires Redis 7.0+ for that reason. + +`reap_idle_pel` is the right primitive for the recovery path because it claims and processes in one step: every entry the call returned is now in *this* consumer's PEL, so the same consumer is responsible for processing and acking it. In production each consumer thread runs `reap_idle_pel` periodically (every few seconds, on a timer) so a crashed peer's entries never sit invisibly. The demo exposes it as a manual button so you can trigger the reap after waiting for the idle threshold. + +`XCLAIM` (singular, no auto) does the same thing for a specific list of entry IDs you already have in hand — useful when you want to take ownership of one known stuck entry, or when you need to move a specific consumer's PEL to a peer (the case the demo's "Remove consumer" button handles via `handover_pending`). `XAUTOCLAIM` cannot filter by source consumer, so it cannot be used for a per-consumer handover. + +## Replay with XRANGE + +`XRANGE` reads a slice of history. It is completely independent of any consumer group — no cursors move, no acks happen — so it is safe to call any number of times, from any process: + +```ruby +def replay(start_id: '-', end_id: '+', count: 100) + rows = @redis.xrange(@stream_key, start_id, end_id, count: count) + rows.map { |entry_id, fields| [entry_id, fields] } +end +``` + +The special IDs `-` and `+` mean "from the very beginning" and "to the very end". You can also pass real IDs (`1716998413541-0`) or just the millisecond part (`1716998413541`, which Redis interprets as "any entry with this timestamp"). + +Typical uses: + +* **Bootstrapping a new projection** — read the entire stream from `-` and build a derived view in another store (a search index, a SQL table, a different cache). Doing this against a consumer group would consume the entries; `XRANGE` lets you do it without disrupting live consumers. +* **Auditing recent activity** — read the last few minutes by ID range without touching any group cursor. +* **Debugging** — fetch one specific entry by its ID, or a tight range around an incident timestamp, to see exactly what producers wrote. + +## The consumer worker thread + +`ConsumerWorker` wraps the `XREADGROUP` → process → `XACK` loop in a Ruby `Thread` +([source](https://github.com/redis/docs/blob/main/content/develop/use-cases/streaming/ruby/consumer_worker.rb)): + +```ruby +def run + until @stop + if @lock.synchronize { @paused } + sleep 0.05 + next + end + begin + entries = @read_stream.consume(@group, @name, count: 10, block_ms: 500) + rescue StandardError => exc + # Don't kill the thread on a transient Redis error; a real + # consumer would log this and back off. + warn "[#{@group}/#{@name}] read failed: #{exc}" + sleep 0.5 + next + end + + entries.each do |entry_id, fields| + dispatch(entry_id, fields) + end + end +end +``` + +`dispatch` either acks (the normal path) or, when the demo has asked the worker to "crash", drops the entry on the floor and increments a counter so the UI can show what is currently in the PEL waiting to be claimed. The `begin/rescue` block around the per-entry handler is important: a transient failure (typically `XACK` against a flaky network) must not kill the worker thread — otherwise every other entry in this consumer's PEL would sit invisibly waiting for `XAUTOCLAIM`. The entry stays unacked instead; the next `reap_idle_pel` call (here or on any consumer in the group) recovers it. + +Each `ConsumerWorker` is constructed with two `RedisEventStream` handles: + +* A `read_stream` with its own dedicated `Redis` connection. The blocking `XREADGROUP BLOCK 500` would otherwise park the demo server's primary connection (and any HTTP handler sharing it) for up to 500 ms. +* A `shared_stream` pointing at the demo server's primary connection. Non-blocking commands (`XACK`, `XAUTOCLAIM`) run through this handle so stats land in a single aggregate counter the UI can render. + +Recovery of stuck PEL entries — this consumer's, after a restart, or another consumer's, after a crash — runs through a separate `reap_idle_pel` method rather than the read loop. That method calls `XAUTOCLAIM` with this consumer as the target, then processes whatever was claimed in the same flow as new entries. This is the textbook Streams pattern: each consumer is its own reaper, running `XAUTOCLAIM(self)` periodically (or on demand) so a crashed peer's entries never sit invisibly in the PEL. The demo's "XAUTOCLAIM to selected" button calls `reap_idle_pel` on the chosen consumer; in production you would run it from a timer every few seconds. + +Note that the worker's main read loop deliberately does *not* call `XREADGROUP 0` to drain its own PEL on every iteration. That would re-deliver every pending entry continuously and *reset its idle counter to zero* each time, which would keep crashed entries below the `XAUTOCLAIM` threshold forever. Using `XAUTOCLAIM(self)` as the recovery primitive — which only fires for entries idle longer than `min_idle_time` — avoids that whole class of bug. + +The pause and crash levers exist only for the demo. A real consumer is just the read-process-ack loop — everything else in this class is instrumentation. + +## Prerequisites + +* Redis 7.0 or later. `XAUTOCLAIM` was added in Redis 6.2, but its reply gained a third + element (the list of deleted IDs) in 7.0; the example relies on that shape. +* Ruby 3.0 or later. +* The `redis` gem at version 5.x. `webrick` was removed from Ruby's standard library in + Ruby 3.0, so install both: + + ```bash + gem install redis webrick + ``` + +If your Redis server is running elsewhere, start the demo with `--redis-host` and `--redis-port`. + +## Running the demo + +### Get the source files + +The demo consists of three Ruby files. Download them from the [`ruby` source folder](https://github.com/redis/docs/tree/main/content/develop/use-cases/streaming/ruby) on GitHub, or grab them with `curl`: + +```bash +mkdir streaming-demo && cd streaming-demo +BASE=https://raw.githubusercontent.com/redis/docs/main/content/develop/use-cases/streaming/ruby +curl -O $BASE/event_stream.rb +curl -O $BASE/consumer_worker.rb +curl -O $BASE/demo_server.rb +``` + +### Start the demo server + +From that directory, install the gems and run the server: + +```bash +gem install redis webrick +ruby demo_server.rb +``` + +You should see: + +```text +Deleting any existing data at key 'demo:events:orders' for a clean demo run (pass --no-reset to keep it). +Redis streaming demo server listening on http://127.0.0.1:8787 +Using Redis at localhost:6379 with stream key 'demo:events:orders' (MAXLEN ~ 2000) +Seeded 3 consumer(s) across 2 group(s) +``` + +By default the demo wipes the configured stream key on startup so each run starts from a clean state. Pass `--no-reset` to keep any existing data at the key (useful when re-running against the same stream to inspect prior state), or `--stream-key ` to point the demo at a different key entirely. + +Open [http://127.0.0.1:8787](http://127.0.0.1:8787) in a browser. You can: + +* **Produce** any number of events of a chosen type (or random types). Watch the stream length grow and the tail update. +* See each **consumer group**: its `last-delivered-id`, the size of its pending list, and the consumers in it. Each consumer shows its processed count, pending count, and idle time. +* **Add or remove** consumers within a group at runtime to see Redis split the work across the new shape. +* Click **Crash next 3** on a consumer to drop its next three deliveries — the same effect as a worker process dying after `XREADGROUP` but before `XACK`. Watch the **Pending entries (XPENDING)** panel fill up. +* Wait until the idle time exceeds the threshold (default 5000 ms), pick a healthy target consumer, and click **XAUTOCLAIM to selected** — the stuck entries are reassigned and the delivery counter increments. +* **Replay (XRANGE)** any range to confirm the full history is independent of consumer-group state. +* **XTRIM** with an approximate `MAXLEN` to bound retention. Note that an approximate trim only releases whole macro-nodes — `MAXLEN ~ 50` on a small stream may not delete anything; on a 300-entry stream it typically lands at around 100. +* Click **Reset demo** to drop the stream and re-seed the default groups. + +## Production usage + +### Pick retention by length or by minimum ID + +The demo uses `MAXLEN ~` on every `XADD`. Two alternatives are worth considering: + +* `MINID ~ ` — keep only entries newer than an ID. If you want "the last 24 hours", compute the wall-clock cutoff and pass `XTRIM MINID ~ -0`. This is the right pattern when retention is time-bounded. +* No cap on `XADD` plus a periodic `XTRIM` job — useful if your producer is hot and the per-`XADD` work has to stay minimal, or if retention rules are complex (a separate process can also factor in consumer-group lag). + +In all three cases the trimming is approximate by default. Use exact trimming (`MAXLEN n` or `MINID id` without `~`) only when you genuinely need an exact count. + +### Don't let consumer-group lag silently grow + +`XINFO GROUPS` reports each group's `lag` (entries the group has not yet read) and `pending` (entries delivered but not acked). In production, alert on either of these crossing a threshold — a steadily growing pending count usually means consumers are crashing without `XAUTOCLAIM` running, and a growing lag means consumers can't keep up with producers. + +The same applies inside a group: `XINFO CONSUMERS` reports per-consumer pending counts and idle times, so you can spot one slow consumer holding entries that the rest of the group is waiting on. + +### Make consumer logic idempotent + +`XAUTOCLAIM` can re-deliver an entry to a different consumer after a crash. If your processing has side effects (sending email, charging a card, updating a downstream store), make sure the same entry processed twice gives the same result — use an idempotency key, an upsert with conditional check, or a once-per-id guard table. Redis Streams cannot give you exactly-once semantics on its own. + +### Bound the delivery counter as a poison-pill signal + +`XPENDING` returns each entry's delivery count, incremented on every claim. If an entry has been delivered (and dropped) several times, the next consumer is unlikely to fare better. After some threshold — `deliveries >= 5`, say — route the entry to a *dead-letter stream*, ack it on the original group, and alert. New entries keep flowing past a poison pill (`XREADGROUP >` still delivers fresh work), but the bad entry's repeated reclaim wastes consumer time and keeps the PEL bigger than it needs to be — without a DLQ threshold it can also slowly trip retention/lag alerts. + +### Partition by tenant or entity for scale + +A single Redis Stream is a single key, and on a Redis Cluster a single key lives on a single shard. If your throughput exceeds what one shard can handle, partition the stream — for example by tenant ID (`events:orders:{tenant_a}`, `events:orders:{tenant_b}`) — so different tenants land on different shards. Hash-tags (`{tenant_a}`) keep all related streams on the same shard if you need to multi-stream atomically. + +Per-entity partitioning (`events:order:{order_id}`) is the canonical pattern when you treat each entity's stream as the event-sourcing log for that entity: every state change for one order goes on its own stream, which is also bounded in size by the entity's lifetime. + +### Use a separate consumer pool per group + +The demo runs every consumer in one Ruby process. In production each consumer group is usually its own deployment — its own pool of pods or VMs — so a slow projection in `analytics` cannot pull `notifications` workers off their stream. Each pod runs one consumer thread per CPU core (mind the GVL: pure-Ruby consumers won't go in parallel on MRI), with `XAUTOCLAIM` either embedded in the consumer loop (every N reads, claim idle entries to self) or run by a separate reaper. + +### Use one `Redis` connection per consumer thread + +A `redis-rb` `Redis` instance holds a single TCP socket and serialises every command through an internal monitor. A blocking call like `XREADGROUP BLOCK 500` parks that monitor for the full block window, so a second thread issuing *any* command — even a simple `XACK` — would queue behind it. The demo gives each `ConsumerWorker` a dedicated read connection for the blocking `XREADGROUP` loop and routes its `XACK` and `XAUTOCLAIM` calls through the demo server's shared connection so the aggregate stats counter stays in sync. In production with `redis-rb`, use a `ConnectionPool` (from the `connection_pool` gem) for the non-blocking side and a dedicated per-consumer `Redis` for the blocking reads. + +### Don't read with XREAD (no group) and then try to ack + +`XREAD` and `XREADGROUP` are different mechanisms. `XREAD` is a tail-the-log read with no consumer-group state — entries are not added to any PEL, and you cannot `XACK` them. If you want at-least-once delivery and crash recovery, you must read through a consumer group. + +`XREAD` is still useful for read-only tail clients (a UI streaming events, a debugger, a `tail -f`-style command-line tool). It's just not part of the at-least-once path. + +### Inspect the stream directly with redis-cli + +When testing or troubleshooting, inspect the stream directly to confirm the consumer state is what you expect: + +```bash +# Stream summary +redis-cli XLEN demo:events:orders +redis-cli XINFO STREAM demo:events:orders + +# Group cursors and pending counts +redis-cli XINFO GROUPS demo:events:orders + +# Consumers within a group +redis-cli XINFO CONSUMERS demo:events:orders notifications + +# Pending entries with idle time and delivery count +redis-cli XPENDING demo:events:orders notifications - + 20 + +# Tail the stream live (no consumer-group state — like tail -f) +redis-cli XREAD BLOCK 0 STREAMS demo:events:orders '$' + +# Replay a range +redis-cli XRANGE demo:events:orders - + COUNT 50 +``` + +If a group's `lag` is growing while consumers' `idle` times are short, consumers are healthy but producers are outpacing them — add more consumers. If `pending` is growing while `lag` is small, consumers are *receiving* entries but not *acking* them — either they are crashing mid-message or your acking logic has a bug. + +## Learn more + +This example uses the following Redis commands: + +* [`XADD`]({{< relref "/commands/xadd" >}}) to append an event with an approximate `MAXLEN` cap. +* [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}) to read new entries for a consumer in a group. +* [`XACK`]({{< relref "/commands/xack" >}}) to acknowledge a processed entry. +* [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}) to reassign idle pending entries to a healthy consumer. +* [`XCLAIM`]({{< relref "/commands/xclaim" >}}) to take ownership of a specific list of pending entry IDs by hand (used by `handover_pending` to move a leaving consumer's PEL to a peer, since `XAUTOCLAIM` has no source-consumer filter). +* [`XRANGE`]({{< relref "/commands/xrange" >}}) for replay and audit, independent of consumer-group state. +* [`XPENDING`]({{< relref "/commands/xpending" >}}) to inspect the per-group pending list with idle times and delivery counts. +* [`XTRIM`]({{< relref "/commands/xtrim" >}}) for explicit retention enforcement. +* [`XGROUP CREATE`]({{< relref "/commands/xgroup-create" >}}) and + [`XGROUP DELCONSUMER`]({{< relref "/commands/xgroup-delconsumer" >}}) to manage groups and consumers. +* [`XINFO STREAM`]({{< relref "/commands/xinfo-stream" >}}), + [`XINFO GROUPS`]({{< relref "/commands/xinfo-groups" >}}), and + [`XINFO CONSUMERS`]({{< relref "/commands/xinfo-consumers" >}}) for observability. + +See the [`redis-rb` documentation]({{< relref "/develop/clients/ruby" >}}) for the full client reference, and the [Streams overview]({{< relref "/develop/data-types/streams" >}}) for the deeper conceptual model — consumer groups, the PEL, claim semantics, capped streams, and the differences with Kafka partitions. diff --git a/content/develop/use-cases/streaming/ruby/consumer_worker.rb b/content/develop/use-cases/streaming/ruby/consumer_worker.rb new file mode 100644 index 0000000000..75007102b7 --- /dev/null +++ b/content/develop/use-cases/streaming/ruby/consumer_worker.rb @@ -0,0 +1,251 @@ +# Background consumer thread for a single consumer in a consumer group. +# +# Each worker owns a daemon thread that loops on `XREADGROUP >` with a +# short block timeout and acks every entry it processes. Recovery of +# stuck PEL entries (this consumer's, or anyone else's) happens through +# `reap_idle_pel`, which is the textbook Streams pattern: each consumer +# periodically (or on demand) calls `XAUTOCLAIM` with itself as the +# target, then processes whatever it claimed. The demo's "XAUTOCLAIM +# to selected" button is exactly that call. +# +# Two demo-only levers are wired into the loop: +# +# * `pause` parks the worker (so its pending entries age into the +# `XAUTOCLAIM` window without being consumed by `>` reads). +# * `crash_next(n)` tells the worker to drop its next `n` deliveries +# on the floor without acking them -- the same effect as a worker +# process dying mid-message. Those entries stay in the group's PEL +# until `reap_idle_pel` recovers them. +# +# Real consumers do not need either lever; they only need +# `XREADGROUP` -> process -> `XACK` in the main loop and a periodic +# `reap_idle_pel` call to recover stuck entries. + +require 'thread' + +require_relative 'event_stream' + +# One consumer in a consumer group, running in its own Ruby Thread. +# +# Each worker is constructed with two `RedisEventStream` handles: +# +# * `read_stream` -- a dedicated stream/connection used only for the +# blocking `XREADGROUP` and `XAUTOCLAIM` calls. redis-rb 5.x is not +# thread-safe across concurrent calls on a single connection, and +# `XREADGROUP BLOCK n` parks that connection on the server, so each +# worker needs its own. Per-thread acks and stats go through this +# handle too. +# * `shared_stream` -- the demo server's primary stream. The worker +# acks through this handle so the demo's aggregate `acked_total` +# counter reflects every worker's progress, and so the same stats +# surface the HTTP layer reads. Only non-blocking commands run on +# this connection. +class ConsumerWorker + attr_reader :group, :name + + def initialize(read_stream:, shared_stream:, group:, name:, + process_latency_ms: 25, recent_capacity: 20) + @read_stream = read_stream + @shared_stream = shared_stream + @group = group + @name = name + @process_latency_ms = process_latency_ms + @recent_capacity = recent_capacity + + @lock = Mutex.new + @recent = [] + @processed = 0 + @reaped = 0 + @crashed_drops = 0 + @crash_next = 0 + @paused = false + + @stop = true + @thread = nil + end + + # ------------------------------------------------------------------ + # Lifecycle + # ------------------------------------------------------------------ + + def start + return if @thread && @thread.alive? + @stop = false + @thread = Thread.new { run } + @thread.name = "consumer-#{@group}-#{@name}" if @thread.respond_to?(:name=) + end + + def stop(timeout: 1.0) + @stop = true + @thread&.join(timeout) + end + + # ------------------------------------------------------------------ + # Demo levers + # ------------------------------------------------------------------ + + def pause + @lock.synchronize { @paused = true } + end + + def resume + @lock.synchronize { @paused = false } + end + + # Drop the next `count` deliveries without acking them. + # + # The entries stay in the group's PEL with their delivery counter + # incremented, so `XAUTOCLAIM` can recover them once they exceed + # the idle threshold. + def crash_next(count) + n = [count.to_i, 0].max + @lock.synchronize { @crash_next += n } + end + + # ------------------------------------------------------------------ + # Introspection + # ------------------------------------------------------------------ + + def recent + @lock.synchronize { @recent.dup } + end + + def status + @lock.synchronize do + { + 'name' => @name, + 'group' => @group, + 'processed' => @processed, + 'reaped' => @reaped, + 'crashed_drops' => @crashed_drops, + 'paused' => @paused, + 'crash_queued' => @crash_next, + 'alive' => !@thread.nil? && @thread.alive?, + } + end + end + + # ------------------------------------------------------------------ + # Recovery + # ------------------------------------------------------------------ + + # Run `XAUTOCLAIM` into self and process the claimed entries. + # + # Returns a hash with `claimed`, `processed`, and `deleted_ids`. + # Safe to call from any thread -- the heavy lifting is + # `stream.autoclaim` (a Redis call) and the sequential per-entry + # dispatch. + # + # `deleted_ids` are PEL entries whose stream payload was already + # trimmed by `MAXLEN ~` / `XTRIM` before the sweep ran. Redis 7+ + # removes them from the PEL inside `XAUTOCLAIM` itself, so the + # caller does not have to `XACK` them; they are reported so the + # caller can route them to a dead-letter store. + def reap_idle_pel + # XAUTOCLAIM is non-blocking and reap_idle_pel is invoked from + # the HTTP handler thread, so it runs on the shared (demo + # server) connection. The blocking XREADGROUP on the worker's + # own connection is unaffected, and `claimed_total` lands in + # the same stats hash the UI reads. + result = @shared_stream.autoclaim(@group, @name, + page_count: 100, max_pages: 10) + claimed = result[:claimed] + deleted = result[:deleted_ids] + processed = 0 + claimed.each do |entry_id, fields| + begin + handle_entry(entry_id, fields) + processed += 1 + rescue StandardError => exc + warn "[#{@group}/#{@name}] reap failed on #{entry_id}: #{exc}" + end + end + @lock.synchronize { @reaped += processed } + { claimed: claimed.length, processed: processed, deleted_ids: deleted } + end + + private + + def run + until @stop + if @lock.synchronize { @paused } + sleep 0.05 + next + end + begin + entries = @read_stream.consume(@group, @name, count: 10, block_ms: 500) + rescue StandardError => exc + # Don't kill the thread on a transient Redis error; a real + # consumer would log this and back off. + warn "[#{@group}/#{@name}] read failed: #{exc}" + sleep 0.5 + next + end + + entries.each do |entry_id, fields| + dispatch(entry_id, fields) + end + end + rescue StandardError => exc + warn "[#{@group}/#{@name}] worker thread crashed: #{exc.class}: #{exc.message}" + end + + def dispatch(entry_id, fields) + sleep(@process_latency_ms / 1000.0) if @process_latency_ms.to_i > 0 + begin + handle_entry(entry_id, fields) + rescue StandardError => exc + # A failure here (typically XACK against Redis) must not kill + # the worker thread -- that would silently halt this consumer + # while every other entry sat in its PEL waiting for + # XAUTOCLAIM. The entry stays unacked; the next reap call + # (here or on any consumer in the group) can recover it once + # it exceeds the idle threshold. + warn "[#{@group}/#{@name}] failed to handle #{entry_id}: #{exc}" + record_recent(entry_id, fields, acked: false, + note: "handler error: #{exc.message}") + end + end + + def handle_entry(entry_id, fields) + drop = false + @lock.synchronize do + if @crash_next > 0 + drop = true + @crash_next -= 1 + end + end + + if drop + @lock.synchronize { @crashed_drops += 1 } + record_recent(entry_id, fields, acked: false, + note: 'dropped (simulated crash)') + return + end + + # Ack via the shared stream so the demo's aggregate + # `acked_total` counter reflects every worker. XACK is a quick + # non-blocking command, so the brief monitor contention with + # the HTTP handlers (which read state through the same shared + # connection) is negligible in the demo. A production deployment + # could use a per-worker `Redis` connection here and aggregate + # stats separately. + @shared_stream.ack(@group, [entry_id]) + @lock.synchronize { @processed += 1 } + record_recent(entry_id, fields, acked: true, note: '') + end + + def record_recent(entry_id, fields, acked:, note:) + entry = { + 'id' => entry_id, + 'type' => fields['type'] || '', + 'fields' => fields, + 'acked' => acked, + 'note' => note, + } + @lock.synchronize do + @recent.unshift(entry) + @recent.pop while @recent.length > @recent_capacity + end + end +end diff --git a/content/develop/use-cases/streaming/ruby/demo_server.rb b/content/develop/use-cases/streaming/ruby/demo_server.rb new file mode 100644 index 0000000000..e78fc5ae27 --- /dev/null +++ b/content/develop/use-cases/streaming/ruby/demo_server.rb @@ -0,0 +1,959 @@ +#!/usr/bin/env ruby +# Redis streaming demo server. +# +# Run this file and visit http://localhost:8787 to watch a Redis Stream +# in action: producers append events to a single stream, two independent +# consumer groups read the same stream at their own pace, and within +# the `notifications` group two consumers share the work. +# +# Use the UI to: +# +# * Produce events into the stream. +# * Watch each consumer group's last-delivered ID, PEL count, and the +# consumers inside it. +# * Drop the next N messages from a chosen consumer to simulate a +# crash mid-processing, then run XAUTOCLAIM to reassign the stuck +# entries to a healthy consumer. +# * Replay any ID range with XRANGE to confirm the history is +# independent of consumer-group state. +# * Trim the stream with XTRIM to bound retention. + +require 'cgi' +require 'json' +require 'optparse' +require 'redis' +require 'thread' +require 'webrick' + +require_relative 'event_stream' +require_relative 'consumer_worker' + +EVENT_TYPES = %w[order.placed order.paid order.shipped order.cancelled].freeze +DEFAULT_GROUPS = { + 'notifications' => %w[worker-a worker-b], + 'analytics' => %w[worker-c], +}.freeze + +# In-memory registry of consumer workers across all groups. +# +# WEBrick dispatches each HTTP request on a fresh thread, so any code +# that mutates `@workers` (or iterates it while another handler is +# mutating it) needs the lock. +class StreamingDemo + def initialize(stream, redis_factory) + @stream = stream + @redis_factory = redis_factory + @workers = {} + @lock = Monitor.new + end + + def seed(groups) + @lock.synchronize do + groups.each do |group, names| + @stream.ensure_group(group, '0-0') + names.each { |name| add_worker(group, name) } + end + groups.values.sum(&:length) + end + end + + def add_worker(group, name) + @lock.synchronize do + key = [group, name] + return false if @workers.key?(key) + @stream.ensure_group(group, '0-0') + # Each worker gets its own dedicated `read_stream` (with its + # own Redis connection) for the blocking XREADGROUP loop -- a + # `Redis.new` instance holds one socket and serialises every + # command behind a monitor, and the blocking call would park + # any concurrent HTTP handler behind it. The shared `stream` + # handle (the demo server's primary connection) is used for + # non-blocking commands such as XACK and XAUTOCLAIM so all + # stats land in a single counter the UI can render. + read_stream = RedisEventStream.new( + redis: @redis_factory.call, + stream_key: @stream.stream_key, + maxlen_approx: @stream.maxlen_approx, + claim_min_idle_ms: @stream.claim_min_idle_ms, + ) + worker = ConsumerWorker.new( + read_stream: read_stream, + shared_stream: @stream, + group: group, + name: name, + ) + worker.start + @workers[key] = worker + true + end + end + + # Remove a consumer safely. + # + # `XGROUP DELCONSUMER` destroys the consumer's PEL entries + # outright, so any pending message it still owned would become + # unreachable. Before deleting, hand its PEL off to another + # consumer in the same group with `XCLAIM`. Without a peer + # consumer to take over, refuse to delete and leave the worker in + # place so the user can add a peer first. + def remove_worker(group, name) + @lock.synchronize do + key = [group, name] + worker = @workers[key] + return { 'removed' => false, 'reason' => 'not-found' } if worker.nil? + + peers = @workers.keys.select { |g, n| g == group && n != name }.map { |_, n| n } + if peers.empty? + return { + 'removed' => false, + 'reason' => 'no-peer', + 'message' => ( + "#{group}/#{name} still owns pending entries and is the only " \ + 'consumer in its group; add another consumer first so its ' \ + 'PEL can be handed over before deletion.' + ), + } + end + + handover_target = peers.first + claimed_count = @stream.handover_pending(group, name, handover_target) + + @workers.delete(key) + worker.stop + @stream.delete_consumer(group, name) + { + 'removed' => true, + 'handed_over_to' => handover_target, + 'handed_over_count' => claimed_count, + } + end + end + + def get_worker(group, name) + @lock.synchronize { @workers[[group, name]] } + end + + # Stable list of [[group, name], worker] pairs safe to iterate outside the lock. + def workers_snapshot + @lock.synchronize { @workers.to_a } + end + + def stop_all + @lock.synchronize do + @workers.each_value(&:stop) + @workers.clear + end + end + + def reset + @lock.synchronize do + stop_all + @stream.delete_stream + @stream.reset_stats + seed(DEFAULT_GROUPS) + end + end +end + +# -- HTTP helpers ---------------------------------------------------- + +def parse_args(argv) + opts = { + host: '127.0.0.1', + port: 8787, + redis_host: 'localhost', + redis_port: 6379, + stream_key: 'demo:events:orders', + maxlen: 2000, + claim_idle_ms: 5000, + reset_on_start: true, + } + OptionParser.new do |o| + o.banner = 'Usage: demo_server.rb [options]' + o.on('--host HOST', 'HTTP bind host') { |v| opts[:host] = v } + o.on('--port PORT', Integer, 'HTTP bind port') { |v| opts[:port] = v } + o.on('--redis-host HOST', 'Redis host') { |v| opts[:redis_host] = v } + o.on('--redis-port PORT', Integer, 'Redis port') { |v| opts[:redis_port] = v } + o.on('--stream-key NAME', 'Redis Stream key') { |v| opts[:stream_key] = v } + o.on('--maxlen N', Integer, 'Approximate MAXLEN cap on every XADD') { |v| opts[:maxlen] = v } + o.on('--claim-idle-ms MS', Integer, + 'Minimum idle time before XAUTOCLAIM may reassign a pending entry') do |v| + opts[:claim_idle_ms] = v + end + o.on('--no-reset', 'Keep existing stream data on startup') { opts[:reset_on_start] = false } + end.parse!(argv) + opts +end + +def parse_form(body) + CGI.parse(body.to_s) +end + +def clamp_int(value, lo, hi, default) + v = Integer(value) + [[v, lo].max, hi].min +rescue ArgumentError, TypeError + default +end + +# -- HTML template --------------------------------------------------- + +HTML_TEMPLATE = <<~HTML.freeze + + + + + + Redis Streaming Demo + + + +
+
redis-rb + WEBrick
+

Redis Streaming Demo

+

+ Producers append events to a single Redis Stream + (__STREAM_KEY__). Two consumer groups read the same + stream independently: notifications shares its work + across two consumers, analytics processes the full + flow on its own. Acknowledge with XACK, recover + crashed deliveries with XAUTOCLAIM, replay any range + with XRANGE, and bound retention with XTRIM. +

+ +
+
+

Stream state

+
Loading...
+ + +
+ +
+

Produce events

+

Events are appended with XADD with an approximate + MAXLEN ~ __MAXLEN__ retention cap.

+ + + + + +
+ +
+

Replay range (XRANGE)

+

Reads a slice of history. Replay is independent of any + consumer group — no cursors move, no acks happen.

+ + + + + + + +
+ +
+

Trim retention (XTRIM)

+

Cap the stream length. Approximate trimming releases whole + macro-nodes, which is much cheaper than exact trimming.

+ + + +
+ +
+

Consumer groups

+
Loading...
+
+ +
+

Pending entries (XPENDING)

+

Entries delivered to a consumer that haven't been acked yet. + Idle time ≥ __CLAIM_IDLE__ ms is eligible for + XAUTOCLAIM.

+
Loading...
+
+ + +
+
+ +
+

Last result

+

Produce events, replay a range, or trigger an autoclaim to see results.

+
+
+ +
+
+ + + + +HTML + +# -- Servlet -------------------------------------------------------- + +class StreamingServlet < WEBrick::HTTPServlet::AbstractServlet + def initialize(server, stream, demo) + super(server) + @stream = stream + @demo = demo + end + + def do_GET(req, res) + case req.path + when '/', '/index.html' + send_html(res, html_page) + when '/state' + send_json(res, build_state) + when '/replay' + handle_replay(req, res) + else + res.status = 404 + res.body = 'not found' + end + end + + def do_POST(req, res) + case req.path + when '/produce' then handle_produce(req, res) + when '/add-worker' then handle_add_worker(req, res) + when '/remove-worker' then handle_remove_worker(req, res) + when '/crash' then handle_crash(req, res) + when '/autoclaim' then handle_autoclaim(req, res) + when '/trim' then handle_trim(req, res) + when '/reset' + count = @demo.reset + send_json(res, 'consumers' => count) + else + res.status = 404 + res.body = 'not found' + end + end + + private + + # ---- POST handlers ---------------------------------------------- + + def handle_produce(req, res) + params = parse_form(req.body) + count = clamp_int((params['count'] || ['1']).first || '1', 1, 500, 1) + event_type = ((params['type'] || ['']).first || '').strip + events = Array.new(count) do + picked = event_type.empty? ? EVENT_TYPES.sample : event_type + [picked, fake_payload] + end + ids = @stream.produce_batch(events) + send_json(res, 'produced' => ids.length, 'ids' => ids) + end + + def handle_add_worker(req, res) + params = parse_form(req.body) + group = ((params['group'] || ['']).first || '').strip + name = ((params['name'] || ['']).first || '').strip + if group.empty? || name.empty? + send_json(res, { 'error' => 'group and name are required' }, status: 400) + return + end + unless @demo.add_worker(group, name) + send_json(res, { 'error' => "#{group}/#{name} already exists" }, status: 409) + return + end + send_json(res, 'group' => group, 'name' => name) + end + + def handle_remove_worker(req, res) + params = parse_form(req.body) + group = ((params['group'] || ['']).first || '').strip + name = ((params['name'] || ['']).first || '').strip + result = @demo.remove_worker(group, name) + status = if result['removed'] || result['reason'] == 'not-found' + 200 + else + 409 + end + send_json(res, result, status: status) + end + + def handle_crash(req, res) + params = parse_form(req.body) + group = ((params['group'] || ['']).first || '').strip + name = ((params['name'] || ['']).first || '').strip + count = clamp_int((params['count'] || ['1']).first || '1', 0, 10_000, 1) + worker = @demo.get_worker(group, name) + if worker.nil? + send_json(res, { 'error' => "unknown consumer #{group}/#{name}" }, status: 404) + return + end + worker.crash_next(count) + send_json(res, 'queued' => count) + end + + def handle_autoclaim(req, res) + params = parse_form(req.body) + group = ((params['group'] || ['']).first || '').strip + consumer = ((params['consumer'] || ['']).first || '').strip + if group.empty? || consumer.empty? + send_json(res, { 'error' => 'group and consumer are required' }, status: 400) + return + end + worker = @demo.get_worker(group, consumer) + if worker.nil? + send_json(res, { 'error' => "unknown consumer #{group}/#{consumer}" }, status: 404) + return + end + # `reap_idle_pel` runs XAUTOCLAIM(self) + process + ack. `deleted_ids` + # are PEL entries whose stream payload was already trimmed by + # MAXLEN ~ before the sweep ran. Redis 7+ removes them from the PEL + # inside XAUTOCLAIM itself, so the caller doesn't have to XACK them; + # in production they would be routed to a dead-letter store for + # offline inspection. + result = worker.reap_idle_pel + send_json(res, + 'claimed' => result[:claimed], + 'processed' => result[:processed], + 'deleted' => result[:deleted_ids], + 'min_idle_ms' => @stream.claim_min_idle_ms) + end + + def handle_trim(req, res) + params = parse_form(req.body) + maxlen = clamp_int((params['maxlen'] || ['0']).first || '0', 0, 1_000_000_000, 0) + deleted = @stream.trim_maxlen(maxlen) + send_json(res, 'deleted' => deleted, 'maxlen' => maxlen) + end + + def handle_replay(req, res) + query = req.query + start_id = (query['start'].to_s.empty? ? '-' : query['start']) + end_id = (query['end'].to_s.empty? ? '+' : query['end']) + limit = clamp_int(query['count'].to_s.empty? ? '20' : query['count'], 1, 500, 20) + entries = @stream.replay(start_id: start_id, end_id: end_id, count: limit) + send_json(res, + 'start' => start_id, + 'end' => end_id, + 'limit' => limit, + 'entries' => entries.map { |id, fields| { 'id' => id, 'fields' => fields } }) + end + + # ---- State assembly --------------------------------------------- + + def build_state + stream_info = @stream.info_stream + groups = @stream.info_groups + + workers_snapshot = @demo.workers_snapshot + groups_detail = [] + pending_rows = [] + + groups.each do |group| + group_name = group['name'] + consumer_info = {} + @stream.info_consumers(group_name).each { |c| consumer_info[c['name']] = c } + consumers_detail = [] + workers_snapshot.each do |(g_name, c_name), worker| + next unless g_name == group_name + info = consumer_info[c_name] || {} + status = worker.status + consumers_detail << status.merge( + 'pending' => info['pending'] || 0, + 'idle_ms' => info['idle_ms'] || 0, + 'recent' => worker.recent, + ) + end + # Also include consumers that exist in Redis but not in our + # in-process registry (e.g. orphaned after a restart). + consumer_info.each do |c_name, info| + next if consumers_detail.any? { |c| c['name'] == c_name } + consumers_detail << { + 'name' => c_name, + 'group' => group_name, + 'processed' => 0, + 'reaped' => 0, + 'crashed_drops' => 0, + 'paused' => false, + 'crash_queued' => 0, + 'alive' => false, + 'pending' => info['pending'] || 0, + 'idle_ms' => info['idle_ms'] || 0, + 'recent' => [], + } + end + consumers_detail.sort_by! { |c| c['name'] } + groups_detail << group.merge('consumers_detail' => consumers_detail) + + @stream.pending_detail(group_name, count: 50).each do |row| + pending_rows << row.merge('group' => group_name) + end + end + + # XREVRANGE returns the newest N entries (in reverse order); the tail + # view wants the most recent activity, not the head of history. + tail = @stream.tail(count: 10).map { |id, fields| { 'id' => id, 'fields' => fields } } + + { + 'stream' => stream_info, + 'tail' => tail, + 'groups' => groups_detail, + 'pending' => pending_rows, + 'stats' => @stream.stats, + } + end + + # ---- HTTP plumbing ---------------------------------------------- + + def send_html(res, body) + res.status = 200 + res['Content-Type'] = 'text/html; charset=utf-8' + res.body = body + end + + def send_json(res, payload, status: 200) + res.status = status + res['Content-Type'] = 'application/json' + res.body = JSON.generate(payload) + end + + def html_page + HTML_TEMPLATE + .gsub('__STREAM_KEY__', @stream.stream_key) + .gsub('__MAXLEN__', @stream.maxlen_approx.to_s) + .gsub('__CLAIM_IDLE__', @stream.claim_min_idle_ms.to_s) + end +end + +def fake_payload + { + 'order_id' => format('o-%04d', rand(1000..9999)), + 'customer' => %w[alice bob carol dan erin].sample, + 'amount' => format('%.2f', rand * 245 + 5), + } +end + +# -- Entry point ----------------------------------------------------- + +def main + args = parse_args(ARGV) + + redis_factory = -> { Redis.new(host: args[:redis_host], port: args[:redis_port]) } + stream = RedisEventStream.new( + redis: redis_factory.call, + stream_key: args[:stream_key], + maxlen_approx: args[:maxlen], + claim_min_idle_ms: args[:claim_idle_ms], + ) + + demo = StreamingDemo.new(stream, redis_factory) + + if args[:reset_on_start] + puts "Deleting any existing data at key '#{args[:stream_key]}'" \ + ' for a clean demo run (pass --no-reset to keep it).' + stream.delete_stream + end + seeded = demo.seed(DEFAULT_GROUPS) + + server = WEBrick::HTTPServer.new( + BindAddress: args[:host], + Port: args[:port], + Logger: WEBrick::Log.new($stderr, WEBrick::Log::WARN), + AccessLog: [], + ) + server.mount('/', StreamingServlet, stream, demo) + + trap('INT') { server.shutdown } + trap('TERM') { server.shutdown } + + puts "Redis streaming demo server listening on http://#{args[:host]}:#{args[:port]}" + puts "Using Redis at #{args[:redis_host]}:#{args[:redis_port]}" \ + " with stream key '#{args[:stream_key]}' (MAXLEN ~ #{args[:maxlen]})" + puts "Seeded #{seeded} consumer(s) across #{DEFAULT_GROUPS.length} group(s)" + + begin + server.start + ensure + demo.stop_all + end +end + +main if $PROGRAM_NAME == __FILE__ diff --git a/content/develop/use-cases/streaming/ruby/event_stream.rb b/content/develop/use-cases/streaming/ruby/event_stream.rb new file mode 100644 index 0000000000..ac449a3f28 --- /dev/null +++ b/content/develop/use-cases/streaming/ruby/event_stream.rb @@ -0,0 +1,370 @@ +# Redis event-stream helper backed by a single Redis Stream. +# +# Producers append events with `XADD`. Consumers belong to consumer +# groups and read with `XREADGROUP`. The group as a whole tracks a +# single `last-delivered-id` cursor, and each consumer gets its own +# pending-entries list (PEL) of in-flight messages it has been handed. +# Once a consumer has processed an entry it acknowledges it with +# `XACK`; entries left unacknowledged past an idle threshold can be +# swept to a healthy consumer with `XAUTOCLAIM` (or to a specific one +# with `XCLAIM`). +# +# Each `XADD` carries an approximate `MAXLEN` so the stream stays +# bounded as it rolls forward. `XRANGE` supports replay over the +# retained history for debugging, audit, or rebuilding a downstream +# projection. Note that approximate trimming can release entries that +# are still in a group's PEL: those entries appear in `XAUTOCLAIM`'s +# deleted-IDs list, which the caller should log and route to a +# dead-letter store. Redis 7+ removes them from the PEL inside the +# `XAUTOCLAIM` call itself, so no explicit `XACK` is needed. +# +# The same stream can be read by any number of consumer groups -- each +# group has its own cursor and its own pending lists, so analytics, +# notifications, and audit can all process the full event flow at their +# own pace without coordinating with each other. + +require 'redis' +require 'thread' + +# Producer/consumer helper for a single Redis Stream with consumer groups. +class RedisEventStream + attr_reader :stream_key, :maxlen_approx, :claim_min_idle_ms + + def initialize(redis:, stream_key: 'demo:events:orders', + maxlen_approx: 10_000, claim_min_idle_ms: 15_000) + @redis = redis + @stream_key = stream_key + @maxlen_approx = maxlen_approx + @claim_min_idle_ms = claim_min_idle_ms + + @stats_lock = Mutex.new + @produced_total = 0 + @acked_total = 0 + @claimed_total = 0 + end + + # ------------------------------------------------------------------ + # Producer + # ------------------------------------------------------------------ + + # Append a single event. Returns the stream ID Redis assigned. + def produce(event_type, payload) + produce_batch([[event_type, payload]]).first + end + + # Pipeline several `XADD` calls in one round trip. + # + # Each entry carries an approximate `MAXLEN` cap. The `~` flavour + # lets Redis trim at a macro-node boundary, which is much cheaper + # than exact trimming and is the right call for a retention + # guardrail rather than a hard size limit. + def produce_batch(events) + events = events.to_a + return [] if events.empty? + ids = @redis.pipelined do |pipe| + events.each do |event_type, payload| + fields = encode_fields(event_type, payload) + pipe.xadd(@stream_key, fields, + maxlen: @maxlen_approx, approximate: true) + end + end + @stats_lock.synchronize { @produced_total += ids.length } + ids + end + + # ------------------------------------------------------------------ + # Consumer groups + # ------------------------------------------------------------------ + + # Create the consumer group if it doesn't exist. + # + # `$` means "deliver only events appended after this point"; pass + # `0-0` to replay the entire stream into a fresh group. + def ensure_group(group, start_id = '$') + @redis.xgroup(:create, @stream_key, group, start_id, mkstream: true) + rescue Redis::CommandError => exc + raise unless exc.message.include?('BUSYGROUP') + end + + def delete_group(group) + @redis.xgroup(:destroy, @stream_key, group).to_i + end + + # Read new entries for this consumer via `XREADGROUP`. + # + # The `>` ID means "deliver entries this consumer group has not + # delivered to *anyone* yet" -- that is the at-least-once path. + # Replaying an explicit ID instead would re-deliver an entry that + # is already in this consumer's pending list (see + # `consume_own_pel` for that recovery path). + def consume(group, consumer, count: 10, block_ms: 500) + result = @redis.xreadgroup(group, consumer, @stream_key, '>', + count: count, block: block_ms) + flatten_entries(result) + end + + # Re-deliver entries already in this consumer's PEL. + # + # Reading with an explicit ID (`0` here) instead of `>` replays the + # entries already assigned to this consumer name without advancing + # the group's `last-delivered-id`. This is the canonical recovery + # path after a crash on the same consumer name, and is also how a + # consumer picks up entries that another consumer (or `XAUTOCLAIM`) + # handed to it. + def consume_own_pel(group, consumer, count: 10) + result = @redis.xreadgroup(group, consumer, @stream_key, '0', + count: count) + flatten_entries(result) + end + + def ack(group, ids) + ids = Array(ids) + return 0 if ids.empty? + n = @redis.xack(@stream_key, group, ids).to_i + @stats_lock.synchronize { @acked_total += n } + n + end + + # Sweep idle pending entries to `consumer`. + # + # A single `XAUTOCLAIM` call scans up to `page_count` PEL entries + # starting at `start_id` and returns a continuation cursor. For a + # full sweep of the PEL, loop until the cursor returns to `0-0` (or + # hit `max_pages` as a safety net so a very large PEL can't + # monopolise the call). + # + # Returns a hash `{claimed: [...], deleted_ids: [...]}`. + # `deleted_ids` are PEL entries whose stream payload had already + # been trimmed by the time this sweep ran (typically because + # `MAXLEN ~` retention outran a slow consumer). `XAUTOCLAIM` + # removes those dangling slots from the PEL itself -- the caller + # does *not* need to `XACK` them -- but they cannot be retried, so + # log and route them to a dead-letter store for observability. + # + # redis-rb 5.x's `xautoclaim` wrapper discards the third return + # element (deleted IDs), so this helper drives `XAUTOCLAIM` + # directly via `redis.call` and parses the raw reply. + def autoclaim(group, consumer, page_count: 100, start_id: '0-0', + max_pages: 10) + claimed_all = [] + deleted_all = [] + cursor = start_id + max_pages.times do + reply = @redis.call('XAUTOCLAIM', @stream_key, group, consumer, + @claim_min_idle_ms.to_s, cursor, + 'COUNT', page_count.to_s) + next_cursor = reply[0] + claimed = parse_entries(reply[1]) + deleted = Array(reply[2]) + claimed_all.concat(claimed) + deleted_all.concat(deleted) + break if next_cursor == '0-0' + cursor = next_cursor + end + @stats_lock.synchronize { @claimed_total += claimed_all.length } + { claimed: claimed_all, deleted_ids: deleted_all } + end + + # Drop a consumer from a group. + # + # `XGROUP DELCONSUMER` destroys this consumer's PEL entries -- any + # entry it still owned is no longer tracked anywhere in the group, + # and `XAUTOCLAIM` will never find it again. Always + # `handover_pending` (or `XCLAIM` it manually) to a healthy + # consumer first; this method is the raw destructive call and is + # exposed only for explicit cleanup. + def delete_consumer(group, consumer) + @redis.xgroup(:delconsumer, @stream_key, group, consumer).to_i + rescue Redis::CommandError + 0 + end + + # Move every PEL entry owned by `from_consumer` to `to_consumer`. + # + # Enumerates the source consumer's PEL with + # `XPENDING ... CONSUMER` and reassigns each ID with `XCLAIM` at + # zero idle time so the move is unconditional. (`XAUTOCLAIM` does + # not filter by source consumer, so it cannot be used for a + # per-consumer handover.) + # + # Call this before `delete_consumer` whenever the source still has + # pending entries -- otherwise `XGROUP DELCONSUMER` would silently + # destroy them and they could never be recovered. + def handover_pending(group, from_consumer, to_consumer, batch: 100) + moved = 0 + loop do + rows = @redis.xpending(@stream_key, group, '-', '+', batch, + from_consumer) + break if rows.nil? || rows.empty? + ids = rows.map { |row| row['entry_id'] } + claimed = @redis.xclaim(@stream_key, group, to_consumer, 0, ids) + moved += claimed.is_a?(Hash) ? claimed.length : Array(claimed).length + break if rows.length < batch + end + @stats_lock.synchronize { @claimed_total += moved } + moved + end + + # ------------------------------------------------------------------ + # Replay, length, trim + # ------------------------------------------------------------------ + + # Range read with `XRANGE` for replay or audit. + # + # Read-only: ranges do not update any group cursor and do not ack + # anything. Useful for bootstrapping a new projection, for building + # an audit view, or for debugging what actually went through the + # stream. + def replay(start_id: '-', end_id: '+', count: 100) + rows = @redis.xrange(@stream_key, start_id, end_id, count: count) + rows.map { |entry_id, fields| [entry_id, fields] } + end + + def length + @redis.xlen(@stream_key).to_i + end + + def trim_maxlen(maxlen) + @redis.xtrim(@stream_key, maxlen, approximate: true).to_i + end + + def trim_minid(minid) + @redis.xtrim(@stream_key, minid, strategy: 'MINID', + approximate: true).to_i + end + + # ------------------------------------------------------------------ + # Inspection + # ------------------------------------------------------------------ + + # Subset of `XINFO STREAM` that's safe to JSON-encode. + def info_stream + raw = @redis.xinfo(:stream, @stream_key) + first = raw['first-entry'] + last = raw['last-entry'] + { + 'length' => raw['length'].to_i, + 'last_generated_id' => raw['last-generated-id'], + 'first_entry_id' => first ? first[0] : nil, + 'last_entry_id' => last ? last[0] : nil, + } + rescue Redis::CommandError + { 'length' => 0, 'last_generated_id' => nil, + 'first_entry_id' => nil, 'last_entry_id' => nil } + end + + def info_groups + rows = @redis.xinfo(:groups, @stream_key) + rows.map do |row| + { + 'name' => row['name'], + 'consumers' => row.fetch('consumers', 0).to_i, + 'pending' => row.fetch('pending', 0).to_i, + 'last_delivered_id' => row['last-delivered-id'], + 'lag' => row['lag'].nil? ? nil : row['lag'].to_i, + } + end + rescue Redis::CommandError + [] + end + + def info_consumers(group) + rows = @redis.xinfo(:consumers, @stream_key, group) + rows.map do |row| + { + 'name' => row['name'], + 'pending' => row.fetch('pending', 0).to_i, + 'idle_ms' => row.fetch('idle', 0).to_i, + } + end + rescue Redis::CommandError + [] + end + + # Per-entry PEL view (id, consumer, idle, deliveries). + def pending_detail(group, count: 20) + rows = @redis.xpending(@stream_key, group, '-', '+', count) + return [] if rows.nil? || rows.empty? + rows.map do |row| + { + 'id' => row['entry_id'], + 'consumer' => row['consumer'], + 'idle_ms' => row['elapsed'].to_i, + 'deliveries' => row['count'].to_i, + } + end + rescue Redis::CommandError + [] + end + + def stats + @stats_lock.synchronize do + { + 'produced_total' => @produced_total, + 'acked_total' => @acked_total, + 'claimed_total' => @claimed_total, + } + end + end + + def reset_stats + @stats_lock.synchronize do + @produced_total = 0 + @acked_total = 0 + @claimed_total = 0 + end + end + + # ------------------------------------------------------------------ + # Demo housekeeping + # ------------------------------------------------------------------ + + # Drop the stream key entirely. Used by the demo's reset path. + def delete_stream + @redis.del(@stream_key) + end + + # XREVRANGE for the demo's "tail" view. Exposed as a convenience + # because the demo server wants the newest N entries. + def tail(count: 10) + rows = @redis.xrevrange(@stream_key, '+', '-', count: count) + rows.map { |entry_id, fields| [entry_id, fields] } + end + + private + + def encode_fields(event_type, payload) + fields = { + 'type' => event_type, + 'ts_ms' => (Time.now.to_f * 1000).to_i.to_s, + } + payload.each do |key, value| + fields[key.to_s] = value.nil? ? '' : value.to_s + end + fields + end + + # `XREADGROUP` via redis-rb returns `{ stream_key => { id => {field=>value} } }` + # (a hash, not the python list-of-tuples). Flatten to `[[id, fields], ...]`. + def flatten_entries(raw) + return [] if raw.nil? || raw.empty? + out = [] + raw.each do |_stream, entries| + entries.each do |entry_id, fields| + out << [entry_id, fields || {}] + end + end + out + end + + # Parse the raw XAUTOCLAIM "entries" reply (flat array of + # `[id, [field, value, field, value, ...]]`) into `[[id, hash], ...]`. + def parse_entries(raw) + return [] if raw.nil? + raw.compact.map do |entry_id, kv| + fields = {} + Array(kv).each_slice(2) { |k, v| fields[k] = v } + [entry_id, fields] + end + end +end diff --git a/content/develop/use-cases/streaming/rust/Cargo.toml b/content/develop/use-cases/streaming/rust/Cargo.toml new file mode 100644 index 0000000000..6b27036b79 --- /dev/null +++ b/content/develop/use-cases/streaming/rust/Cargo.toml @@ -0,0 +1,16 @@ +[package] +name = "streaming-demo" +version = "0.1.0" +edition = "2021" + +[dependencies] +redis = { version = "0.24", features = ["tokio-comp", "aio", "connection-manager", "streams"] } +tokio = { version = "1", features = ["full"] } +axum = "0.7" +serde = { version = "1.0", features = ["derive"] } +serde_json = "1.0" +rand = "0.8" + +[[bin]] +name = "demo_server" +path = "demo_server.rs" diff --git a/content/develop/use-cases/streaming/rust/_index.md b/content/develop/use-cases/streaming/rust/_index.md new file mode 100644 index 0000000000..8cdccbdeb7 --- /dev/null +++ b/content/develop/use-cases/streaming/rust/_index.md @@ -0,0 +1,548 @@ +--- +categories: +- docs +- develop +- stack +- oss +- rs +- rc +description: Implement a Redis event-streaming pipeline in Rust with redis-rs +linkTitle: redis-rs example (Rust) +title: Redis streaming with redis-rs +weight: 9 +--- + +This guide shows you how to build a Redis-backed event-streaming pipeline in Rust with the [`redis`](https://crates.io/crates/redis) crate (redis-rs). It includes a small local web server built with `axum` and `tokio` so you can produce events into a single Redis Stream, watch two independent consumer groups read it at their own pace, and recover stuck deliveries with `XAUTOCLAIM` after simulating a consumer crash. + +## Overview + +A Redis Stream is an append-only log of field/value entries with auto-generated, time-ordered IDs. Producers append with [`XADD`]({{< relref "/commands/xadd" >}}); consumers belong to *consumer groups* and read with [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}). The group as a whole tracks a single `last-delivered-id` cursor, and each consumer gets its own pending-entries list (PEL) of messages it has been handed but not yet acknowledged. Once a consumer has processed an entry it calls [`XACK`]({{< relref "/commands/xack" >}}) to clear the entry from its PEL; entries left unacknowledged past an idle threshold can be reassigned to a healthy consumer with [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}). + +That gives you: + +* Ordered, durable history that many independent consumer groups can read at their own pace +* At-least-once delivery, with per-consumer pending lists and automatic recovery of crashed consumers +* Horizontal scaling within a group — add a consumer and Redis automatically splits the work +* Replay of any range with [`XRANGE`]({{< relref "/commands/xrange" >}}), independent of consumer-group state +* Bounded retention through [`XADD MAXLEN ~`]({{< relref "/commands/xadd" >}}) or + [`XTRIM MINID ~`]({{< relref "/commands/xtrim" >}}), without a separate cleanup job + +In this example, producers append order events (`order.placed`, `order.paid`, `order.shipped`, `order.cancelled`) to a single stream at `demo:events:orders`. Two consumer groups read the same stream: + +* **`notifications`** — two consumers (`worker-a`, `worker-b`) sharing the work, modelling a fan-out worker pool. +* **`analytics`** — one consumer (`worker-c`) processing the full event flow on its own. + +## How it works + +The flow looks like this: + +1. The application calls `stream.produce(event_type, payload).await` which runs [`XADD`]({{< relref "/commands/xadd" >}}) with an approximate [`MAXLEN ~`]({{< relref "/commands/xadd" >}}) cap. Redis assigns an auto-generated time-ordered ID. +2. Each consumer task loops on [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}) with the special ID `>` (meaning "deliver entries this group has not yet delivered to anyone") and a short block timeout. +3. After processing each entry, the consumer calls [`XACK`]({{< relref "/commands/xack" >}}) so Redis can drop it from the group's pending list. +4. If a consumer is killed (or crashes) before acking, its entries sit in the group's PEL. A periodic [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}) sweep reassigns idle entries to a healthy consumer. +5. Anyone — including code outside the consumer groups — can read history with [`XRANGE`]({{< relref "/commands/xrange" >}}) without affecting any group's cursor. + +Each consumer group has its own cursor (`last-delivered-id`) and its own pending list, so the two groups in this demo process the same events without coordinating with each other. + +## The event-stream helper + +The `EventStream` struct wraps the stream operations +([source](https://github.com/redis/docs/blob/main/content/develop/use-cases/streaming/rust/event_stream.rs)): + +```rust +use redis::aio::ConnectionManager; +use redis::Client; +use std::collections::HashMap; + +let client = Client::open("redis://127.0.0.1:6379/")?; +let conn = ConnectionManager::new(client).await?; +let stream = EventStream::new( + conn, + "demo:events:orders", // stream key + 2000, // approximate MAXLEN guardrail + 5000, // XAUTOCLAIM idle threshold (ms) +); + +// Producer +let mut payload = HashMap::new(); +payload.insert("order_id".into(), "o-1234".into()); +payload.insert("customer".into(), "alice".into()); +payload.insert("amount".into(), "49.50".into()); +let stream_id = stream.produce("order.placed", payload).await?; + +// Consumer group + one consumer +stream.ensure_group("notifications", "0-0").await?; +let entries = stream.consume("notifications", "worker-a", 10, 500).await?; +for (entry_id, fields) in entries { + handle(&fields); + stream.ack("notifications", vec![entry_id]).await?; // XACK +} + +// Recover stuck PEL entries by reaping them into a healthy consumer. +// The textbook pattern: each consumer periodically calls XAUTOCLAIM +// with itself as the target and processes whatever it claimed. +// `ConsumerWorker::reap_idle_pel` wraps that flow; the low-level helper +// `stream.autoclaim(group, target, ...)` is also available if you +// want to drive XAUTOCLAIM directly. +let result = worker_b.reap_idle_pel().await; +// result == ReapResult { claimed: N, processed: M, deleted_ids: [...] } +// deleted_ids are PEL entries whose payload was already trimmed. +// Redis 7+ has already removed those slots from the PEL, so no XACK +// is needed — log them and route to a dead-letter store for audit. + +// Replay history (independent of any group's cursor) +for (entry_id, fields) in stream.replay("-", "+", 50).await? { + println!("{} {:?}", entry_id, fields); +} +``` + +### Data model + +Each event is a single stream entry — a flat map of field/value strings — with an auto-generated time-ordered ID: + +```text +demo:events:orders + 1716998413541-0 type=order.placed order_id=o-1234 customer=alice amount=49.50 ts_ms=... + 1716998413542-0 type=order.paid order_id=o-1234 customer=alice amount=49.50 ts_ms=... + 1716998413542-1 type=order.shipped order_id=o-1235 customer=bob amount=12.00 ts_ms=... + ... +``` + +The ID is `{milliseconds}-{sequence}`, monotonically increasing within the stream, so you can range-query by approximate wall-clock time without an extra index. (IDs are ordered within a stream, not across streams — two events appended to different streams at the same millisecond can produce the same ID.) The implementation uses: + +* [`XADD ... MAXLEN ~ n`]({{< relref "/commands/xadd" >}}), pipelined, for batch production with a retention cap +* [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}) with the special ID `>` for fresh deliveries to a consumer +* [`XACK`]({{< relref "/commands/xack" >}}) on every processed entry +* [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}) for sweeping idle pending entries to a healthy consumer +* [`XRANGE`]({{< relref "/commands/xrange" >}}) for replay and audit +* [`XPENDING`]({{< relref "/commands/xpending" >}}) for inspecting the per-group pending list +* [`XINFO STREAM`]({{< relref "/commands/xinfo-stream" >}}), + [`XINFO GROUPS`]({{< relref "/commands/xinfo-groups" >}}), and + [`XINFO CONSUMERS`]({{< relref "/commands/xinfo-consumers" >}}) for surface-level observability +* [`XTRIM`]({{< relref "/commands/xtrim" >}}) for explicit retention enforcement + +## Producing events + +`produce_batch` pipelines `XADD` calls in a single round trip. Each call carries an approximate `MAXLEN ~` cap so the stream stays bounded as it rolls forward: + +```rust +pub async fn produce_batch( + &self, + events: Vec<(String, HashMap)>, +) -> RedisResult> { + if events.is_empty() { + return Ok(Vec::new()); + } + let mut pipe = redis::pipe(); + for (event_type, payload) in &events { + let fields = encode_fields(event_type, payload); + pipe.cmd("XADD") + .arg(&self.stream_key) + .arg("MAXLEN") + .arg("~") + .arg(self.maxlen_approx) + .arg("*"); + for (k, v) in &fields { + pipe.arg(k).arg(v); + } + } + let mut conn = self.conn.clone(); + let ids: Vec = pipe.query_async(&mut conn).await?; + self.stats + .produced_total + .fetch_add(ids.len() as u64, Ordering::Relaxed); + Ok(ids) +} +``` + +The `~` flavour of `MAXLEN` lets Redis trim at a macro-node boundary, which is much cheaper than exact trimming and is what you want when the cap is a retention *guardrail*, not a hard size constraint. With 300 events produced and `MAXLEN ~ 50`, you might end up with 100 entries left — Redis released the oldest whole macro-node and stopped. The next `XADD` will keep length stable. + +If you genuinely need an exact cap (rare), use `MAXLEN` without the `~`. The performance difference is significant on busy streams. + +## Reading with a consumer group + +Each consumer in a group runs the same `XREADGROUP` loop. The special ID `>` means "deliver entries this group has not yet delivered to *anyone*": + +```rust +pub async fn consume( + &self, + group: &str, + consumer: &str, + count: usize, + block_ms: usize, +) -> RedisResult> { + let opts = StreamReadOptions::default() + .group(group, consumer) + .count(count) + .block(block_ms); + let mut conn = self.conn.clone(); + let reply: Option = conn + .xread_options(&[self.stream_key.as_str()], &[">"], &opts) + .await?; + Ok(flatten_read_reply(reply)) +} +``` + +`block_ms` makes the call efficient even when the stream is idle: the client parks on the server until either an entry arrives or the timeout expires, so consumers don't busy-loop. + +Reading with an explicit ID like `0-0` instead of `>` does something different — it replays entries already delivered to *this* consumer name (its private PEL). That is the canonical recovery path when the same consumer restarts: catch up on its own pending entries first, then resume reading new ones. + +## Acknowledging entries + +Once the consumer has processed an entry, `XACK` tells Redis it can drop the entry from the group's pending list: + +```rust +pub async fn ack(&self, group: &str, ids: Vec) -> RedisResult { + if ids.is_empty() { + return Ok(0); + } + let mut conn = self.conn.clone(); + let n: i64 = conn.xack(&self.stream_key, group, &ids).await?; + self.stats.acked_total.fetch_add(n as u64, Ordering::Relaxed); + Ok(n) +} +``` + +This is the linchpin of at-least-once delivery: an entry that is never acked stays in the PEL until a claim moves it elsewhere. If your consumer task crashes between processing and ack, the next claim sweep picks the entry back up. The one caveat is retention: `XADD MAXLEN ~` and `XTRIM` can release the entry's *payload* even while its ID is still in the PEL. The next `XAUTOCLAIM` returns those IDs in its `deleted` list and removes them from the PEL inside the same command — the entry cannot be retried, so the caller should log it and route to a dead-letter store for audit. The example handles this explicitly in the autoclaim flow further down. + +The trade-off is the opposite of pub/sub: a slow or crashed consumer doesn't lose messages, but it does mean your downstream system must be idempotent. If you process an order twice because the first attempt died after the side effect but before the ack, the second attempt must be safe. + +## Multiple consumer groups, one stream + +The big difference between Redis Streams and a job queue is that any number of independent consumer groups can read the same stream. The demo sets up two groups on `demo:events:orders`: + +```rust +stream.ensure_group("notifications", "0-0").await?; +stream.ensure_group("analytics", "0-0").await?; +``` + +Each group has its own cursor. Producing 5 events results in `notifications` and `analytics` each receiving all 5, with no coordination between them. Within `notifications`, the work is split across `worker-a` and `worker-b`: Redis hands each `XREADGROUP` call whatever entries are not yet delivered to anyone in the group, so adding a second worker doubles throughput without any rebalance logic. + +The `"0-0"` argument means "deliver everything in the stream from the beginning" — useful in a demo and for fresh groups bootstrapping from history. In production, a brand-new group reading a long-existing stream usually starts at `$` ("only events after this point") and uses [`XRANGE`]({{< relref "/commands/xrange" >}}) explicitly if it needs history. + +## Recovering crashed consumers with XAUTOCLAIM + +The demo's "Crash next 3" button tells a chosen consumer to drop its next three deliveries on the floor without acking them — the same effect as a worker process dying mid-message. Those entries stay in the group's PEL with their delivery counter incremented. Once they have been idle for at least `claim_min_idle_ms`, any healthy consumer in the group can rescue them by calling `XAUTOCLAIM` *with itself as the target*. `ConsumerWorker::reap_idle_pel` wraps that pattern +([source](https://github.com/redis/docs/blob/main/content/develop/use-cases/streaming/rust/consumer_worker.rs)): + +```rust +pub async fn reap_idle_pel(&self) -> ReapResult { + let (claimed, deleted) = match self + .stream + .autoclaim(&self.group, &self.name, 100, "0-0", 10) + .await + { + Ok(v) => v, + Err(_) => return ReapResult::default(), + }; + let mut processed: u64 = 0; + for (entry_id, fields) in claimed.iter() { + tokio::time::sleep(self.process_latency).await; + if self.handle_entry(entry_id.clone(), fields.clone()).await.is_ok() { + processed += 1; + } + } + ReapResult { + claimed: claimed.len() as u64, + processed, + deleted_ids: deleted, + } +} +``` + +The underlying `stream.autoclaim` helper pages through the group's PEL with `XAUTOCLAIM`'s continuation cursor. redis-rs 0.24 has no typed wrapper for `XAUTOCLAIM`, so the helper builds it from raw `redis::cmd("XAUTOCLAIM")` and decodes the three-element Redis 7+ reply by hand: + +```rust +pub async fn autoclaim( + &self, + group: &str, + consumer: &str, + page_count: usize, + start_id: &str, + max_pages: usize, +) -> RedisResult<(Vec, Vec)> { + let mut claimed_all: Vec = Vec::new(); + let mut deleted_all: Vec = Vec::new(); + let mut cursor = start_id.to_string(); + let mut conn = self.conn.clone(); + for _ in 0..max_pages { + let raw: Value = redis::cmd("XAUTOCLAIM") + .arg(&self.stream_key) + .arg(group) + .arg(consumer) + .arg(self.claim_min_idle_ms) + .arg(&cursor) + .arg("COUNT") + .arg(page_count) + .query_async(&mut conn) + .await?; + let (next_cursor, claimed, deleted) = parse_autoclaim_reply(raw)?; + claimed_all.extend(claimed); + deleted_all.extend(deleted); + if next_cursor == "0-0" { + break; + } + cursor = next_cursor; + } + Ok((claimed_all, deleted_all)) +} +``` + +A single `XAUTOCLAIM` call scans up to `page_count` PEL entries starting at `start_id`, reassigns the ones idle for at least `min_idle_time` to the named consumer, and returns a continuation cursor in the first slot of the reply. For a full sweep, loop until the cursor returns to `0-0` (with a `max_pages` safety net so one call cannot monopolise a very large PEL). The delivery counter is incremented on every claim — after a few cycles you can use it to spot a *poison-pill* message that crashes every consumer that touches it, and route it to a dead-letter stream so the bad entry stops cycling. (New entries keep flowing past the poison pill — `XREADGROUP >` still delivers fresh work — but the bad entry's repeated reclaim wastes consumer time and keeps the PEL larger than it needs to be.) + +The `deleted` list contains PEL entry IDs whose stream payload was already trimmed by the time the claim ran (typically because `MAXLEN ~` retention outran a slow consumer). `XAUTOCLAIM` removes those dangling slots from the PEL itself, so the caller does *not* need to `XACK` them — but the entries cannot be retried either, so log and route them to a dead-letter store for offline inspection. Redis 7.0 introduced this third return element; the example requires Redis 7.0+ for that reason. + +`reap_idle_pel` is the right primitive for the recovery path because it claims and processes in one step: every entry the call returned is now in *this* consumer's PEL, so the same consumer is responsible for processing and acking it. In production each consumer task runs `reap_idle_pel` periodically (every few seconds, on a timer) so a crashed peer's entries never sit invisibly. The demo exposes it as a manual button so you can trigger the reap after waiting for the idle threshold. + +`XCLAIM` (singular, no auto) does the same thing for a specific list of entry IDs you already have in hand — useful when you want to take ownership of one known stuck entry, or when you need to move a specific consumer's PEL to a peer (the case the demo's "Remove consumer" button handles via `handover_pending`). `XAUTOCLAIM` cannot filter by source consumer, so it cannot be used for a per-consumer handover. + +## Replay with XRANGE + +`XRANGE` reads a slice of history. It is completely independent of any consumer group — no cursors move, no acks happen — so it is safe to call any number of times, from any task: + +```rust +pub async fn replay( + &self, + start_id: &str, + end_id: &str, + count: usize, +) -> RedisResult> { + let mut conn = self.conn.clone(); + let reply: StreamRangeReply = conn + .xrange_count(&self.stream_key, start_id, end_id, count) + .await?; + Ok(stream_ids_to_entries(reply.ids)) +} +``` + +The special IDs `-` and `+` mean "from the very beginning" and "to the very end". You can also pass real IDs (`1716998413541-0`) or just the millisecond part (`1716998413541`, which Redis interprets as "any entry with this timestamp"). + +Typical uses: + +* **Bootstrapping a new projection** — read the entire stream from `-` and build a derived view in another store (a search index, a SQL table, a different cache). Doing this against a consumer group would consume the entries; `XRANGE` lets you do it without disrupting live consumers. +* **Auditing recent activity** — read the last few minutes by ID range without touching any group cursor. +* **Debugging** — fetch one specific entry by its ID, or a tight range around an incident timestamp, to see exactly what producers wrote. + +## The consumer worker task + +`ConsumerWorker` wraps the `XREADGROUP` → process → `XACK` loop in a spawned tokio task. The main loop reads new entries with `XREADGROUP >`, processes each entry with a small sleep to simulate work, then acks it. If the demo has asked the worker to drop the next *n* deliveries, those entries are recorded as dropped and left in the PEL so `XAUTOCLAIM` can recover them later: + +```rust +async fn run_loop(self: Arc) { + loop { + let (stop, paused) = { + let inner = self.inner.lock().await; + (inner.stop, inner.paused) + }; + if stop { + return; + } + if paused { + tokio::time::sleep(Duration::from_millis(50)).await; + continue; + } + + let entries: Vec = + match self.stream.consume(&self.group, &self.name, 10, 500).await { + Ok(v) => v, + Err(err) => { + eprintln!("[{}/{}] read failed: {}", self.group, self.name, err); + tokio::time::sleep(Duration::from_millis(500)).await; + continue; + } + }; + for (entry_id, fields) in entries { + self.dispatch(entry_id, fields).await; + } + } +} +``` + +`dispatch` wraps `handle_entry` so an `XACK` failure (a Redis hiccup) logs a "handler error" entry and continues, rather than tearing down the spawned task. Letting a Redis error bubble through `spawn`'s panic boundary would silently halt the consumer while every other entry sat in its PEL waiting for `XAUTOCLAIM`. + +Recovery of stuck PEL entries — this consumer's, after a restart, or another consumer's, after a crash — runs through a separate `reap_idle_pel` method rather than the read loop. That method calls `XAUTOCLAIM` with this consumer as the target, then processes whatever was claimed in the same flow as new entries. This is the textbook Streams pattern: each consumer is its own reaper, running `XAUTOCLAIM(self)` periodically (or on demand) so a crashed peer's entries never sit invisibly in the PEL. The demo's "XAUTOCLAIM to selected" button calls `reap_idle_pel` on the chosen consumer; in production you would run it from a timer every few seconds. + +Note that the worker's main read loop deliberately does *not* call `XREADGROUP 0` to drain its own PEL on every iteration. That would re-deliver every pending entry continuously and *reset its idle counter to zero* each time, which would keep crashed entries below the `XAUTOCLAIM` threshold forever. Using `XAUTOCLAIM(self)` as the recovery primitive — which only fires for entries idle longer than `min_idle_time` — avoids that whole class of bug. + +The pause and crash levers exist only for the demo. A real consumer is just the read-process-ack loop — everything else in this class is instrumentation. + +## Prerequisites + +* Redis 7.0 or later. `XAUTOCLAIM` was added in Redis 6.2, but its reply gained a third + element (the list of deleted IDs) in 7.0; the example relies on that shape. +* Rust 1.70 or later (stable). +* The `redis` crate at version 0.24+, with features `["tokio-comp", "aio", "connection-manager", "streams"]`. The `connection-manager` feature gives `ConnectionManager` for cheap, cloneable, auto-reconnecting connections; `streams` adds the typed reply structs for `XINFO`/`XPENDING`. + +The `Cargo.toml` for this demo pins all the runtime crates: + +```toml +[dependencies] +redis = { version = "0.24", features = ["tokio-comp", "aio", "connection-manager", "streams"] } +tokio = { version = "1", features = ["full"] } +axum = "0.7" +serde = { version = "1.0", features = ["derive"] } +serde_json = "1.0" +rand = "0.8" +``` + +If your Redis server is running elsewhere, start the demo with `--redis-host` and `--redis-port`. + +## Running the demo + +### Get the source files + +The demo consists of four files. Download them from the [`rust` source folder](https://github.com/redis/docs/tree/main/content/develop/use-cases/streaming/rust) on GitHub, or grab them with `curl`: + +```bash +mkdir streaming-demo && cd streaming-demo +BASE=https://raw.githubusercontent.com/redis/docs/main/content/develop/use-cases/streaming/rust +curl -O $BASE/Cargo.toml +curl -O $BASE/event_stream.rs +curl -O $BASE/consumer_worker.rs +curl -O $BASE/demo_server.rs +``` + +### Start the demo server + +From that directory: + +```bash +cargo run --release +``` + +You should see: + +```text +Deleting any existing data at key 'demo:events:orders' for a clean demo run (pass --no-reset to keep it). +Redis streaming demo server listening on http://127.0.0.1:8788 +Using Redis at localhost:6379 with stream key 'demo:events:orders' (MAXLEN ~ 2000) +Seeded 3 consumer(s) across 2 group(s) +``` + +By default the demo wipes the configured stream key on startup so each run starts from a clean state. Pass `--no-reset` to keep any existing data at the key (useful when re-running against the same stream to inspect prior state), or `--stream-key ` to point the demo at a different key entirely. + +Open [http://127.0.0.1:8788](http://127.0.0.1:8788) in a browser. You can: + +* **Produce** any number of events of a chosen type (or random types). Watch the stream length grow and the tail update. +* See each **consumer group**: its `last-delivered-id`, the size of its pending list, and the consumers in it. Each consumer shows its processed count, pending count, and idle time. +* **Add or remove** consumers within a group at runtime to see Redis split the work across the new shape. +* Click **Crash next 3** on a consumer to drop its next three deliveries — the same effect as a worker process dying after `XREADGROUP` but before `XACK`. Watch the **Pending entries (XPENDING)** panel fill up. +* Wait until the idle time exceeds the threshold (default 5000 ms), pick a healthy target consumer, and click **XAUTOCLAIM to selected** — the stuck entries are reassigned and the delivery counter increments. +* **Replay (XRANGE)** any range to confirm the full history is independent of consumer-group state. +* **XTRIM** with an approximate `MAXLEN` to bound retention. Note that an approximate trim only releases whole macro-nodes — `MAXLEN ~ 50` on a small stream may not delete anything; on a 300-entry stream it typically lands at around 100. +* Click **Reset demo** to drop the stream and re-seed the default groups. + +## Production usage + +### Use one Redis connection per blocking consumer in real workloads + +`redis-rs`'s `ConnectionManager` wraps a single multiplexed connection. That's the right default for short non-blocking commands — `XADD`, `XACK`, `XPENDING`, `XINFO` — but blocking calls like `XREADGROUP ... BLOCK 500` tie up the server side of the connection for the block duration. If many consumers share one `ConnectionManager`, their `XREADGROUP` calls effectively serialise: while Redis is parked waiting for the first consumer's block to expire (or deliver), the second consumer's request can't even leave the local pipeline. For the demo this is fine — three consumers reading an idle stream produce a steady 1.5s round-trip — but a production deployment with N consumers should hold one `ConnectionManager` per consumer (or per small fan-out) so the block windows don't fight each other. + +### Atomically reserve consumer names in the in-process registry + +The demo keeps an in-process registry of `(group, name) → ConsumerWorker` so the HTTP routes can find a worker by name, route a "crash next 3" request to it, and move its PEL on remove. Two concurrent `add-worker` requests with the same name must produce **one** success and **one** "already exists" error — not two registry entries with one orphaned. Because the registry is mutated from inside `async` HTTP handlers, the duplicate-check has to be atomic against any concurrent caller. + +The pitfall here is that `std::sync::Mutex` is `!Send` and so cannot be held across `.await`; the natural "check the map → release the lock → spawn the worker → re-acquire and insert" structure leaves a window where two callers can both pass the check and both insert. The demo uses a `tokio::sync::Mutex` instead and holds it across the whole reservation + spawn sequence: + +```rust +async fn add_worker(self: &Arc, group: &str, name: &str) -> bool { + let key: WorkerKey = (group.to_string(), name.to_string()); + let mut guard = self.workers.lock().await; + if guard.contains_key(&key) { + return false; + } + let _ = self.stream.ensure_group(group, "0-0").await; + let worker = ConsumerWorker::new(self.stream.clone(), group, name); + worker.start().await; + guard.insert(key, worker); + true +} +``` + +`tokio::sync::Mutex` is `Send`, so the compiler is happy to keep the guard alive across `.await`. The result is that the name is reserved, the worker is spawned, and the insert lands while no other caller can race in. A second concurrent call with the same name waits on the lock, sees the entry, and returns `false`. + +### Pick retention by length or by minimum ID + +The demo uses `MAXLEN ~` on every `XADD`. Two alternatives are worth considering: + +* `MINID ~ ` — keep only entries newer than an ID. If you want "the last 24 hours", compute the wall-clock cutoff and call `stream.trim_minid("-0").await?`. This is the right pattern when retention is time-bounded. +* No cap on `XADD` plus a periodic `XTRIM` task — useful if your producer is hot and the per-`XADD` work has to stay minimal, or if retention rules are complex (a separate task can also factor in consumer-group lag). + +In all three cases the trimming is approximate by default. Use exact trimming (`MAXLEN n` or `MINID id` without `~`) only when you genuinely need an exact count. + +### Don't let consumer-group lag silently grow + +`XINFO GROUPS` reports each group's `lag` (entries the group has not yet read) and `pending` (entries delivered but not acked). In production, alert on either of these crossing a threshold — a steadily growing pending count usually means consumers are crashing without `XAUTOCLAIM` running, and a growing lag means consumers can't keep up with producers. + +The same applies inside a group: `XINFO CONSUMERS` reports per-consumer pending counts and idle times, so you can spot one slow consumer holding entries that the rest of the group is waiting on. + +### Make consumer logic idempotent + +`XAUTOCLAIM` can re-deliver an entry to a different consumer after a crash. If your processing has side effects (sending email, charging a card, updating a downstream store), make sure the same entry processed twice gives the same result — use an idempotency key, an upsert with conditional check, or a once-per-id guard table. Redis Streams cannot give you exactly-once semantics on its own. + +### Bound the delivery counter as a poison-pill signal + +`XPENDING` returns each entry's delivery count, incremented on every claim. If an entry has been delivered (and dropped) several times, the next consumer is unlikely to fare better. After some threshold — `deliveries >= 5`, say — route the entry to a *dead-letter stream*, ack it on the original group, and alert. New entries keep flowing past a poison pill (`XREADGROUP >` still delivers fresh work), but the bad entry's repeated reclaim wastes consumer time and keeps the PEL bigger than it needs to be — without a DLQ threshold it can also slowly trip retention/lag alerts. + +### Partition by tenant or entity for scale + +A single Redis Stream is a single key, and on a Redis Cluster a single key lives on a single shard. If your throughput exceeds what one shard can handle, partition the stream — for example by tenant ID (`events:orders:{tenant_a}`, `events:orders:{tenant_b}`) — so different tenants land on different shards. Hash-tags (`{tenant_a}`) keep all related streams on the same shard if you need to multi-stream atomically. + +Per-entity partitioning (`events:order:{order_id}`) is the canonical pattern when you treat each entity's stream as the event-sourcing log for that entity: every state change for one order goes on its own stream, which is also bounded in size by the entity's lifetime. + +### Use a separate consumer pool per group + +The demo runs every consumer in one process. In production each consumer group is usually its own deployment — its own pool of pods or VMs — so a slow projection in `analytics` cannot pull `notifications` workers off their stream. Each pod runs one consumer task per CPU core, with `XAUTOCLAIM` either embedded in the consumer loop (every N reads, claim idle entries to self) or run by a separate reaper task. + +### Don't read with XREAD (no group) and then try to ack + +`XREAD` and `XREADGROUP` are different mechanisms. `XREAD` is a tail-the-log read with no consumer-group state — entries are not added to any PEL, and you cannot `XACK` them. If you want at-least-once delivery and crash recovery, you must read through a consumer group. + +`XREAD` is still useful for read-only tail clients (a UI streaming events, a debugger, a `tail -f`-style command-line tool). It's just not part of the at-least-once path. + +### Inspect the stream directly with redis-cli + +When testing or troubleshooting, inspect the stream directly to confirm the consumer state is what you expect: + +```bash +# Stream summary +redis-cli XLEN demo:events:orders +redis-cli XINFO STREAM demo:events:orders + +# Group cursors and pending counts +redis-cli XINFO GROUPS demo:events:orders + +# Consumers within a group +redis-cli XINFO CONSUMERS demo:events:orders notifications + +# Pending entries with idle time and delivery count +redis-cli XPENDING demo:events:orders notifications - + 20 + +# Tail the stream live (no consumer-group state — like tail -f) +redis-cli XREAD BLOCK 0 STREAMS demo:events:orders '$' + +# Replay a range +redis-cli XRANGE demo:events:orders - + COUNT 50 +``` + +If a group's `lag` is growing while consumers' `idle` times are short, consumers are healthy but producers are outpacing them — add more consumers. If `pending` is growing while `lag` is small, consumers are *receiving* entries but not *acking* them — either they are crashing mid-message or your acking logic has a bug. + +## Learn more + +This example uses the following Redis commands: + +* [`XADD`]({{< relref "/commands/xadd" >}}) to append an event with an approximate `MAXLEN` cap. +* [`XREADGROUP`]({{< relref "/commands/xreadgroup" >}}) to read new entries for a consumer in a group. +* [`XACK`]({{< relref "/commands/xack" >}}) to acknowledge a processed entry. +* [`XAUTOCLAIM`]({{< relref "/commands/xautoclaim" >}}) to reassign idle pending entries to a healthy consumer. +* [`XCLAIM`]({{< relref "/commands/xclaim" >}}) to take ownership of a specific list of pending entry IDs by hand (used by `handover_pending` to move a leaving consumer's PEL to a peer, since `XAUTOCLAIM` has no source-consumer filter). +* [`XRANGE`]({{< relref "/commands/xrange" >}}) for replay and audit, independent of consumer-group state. +* [`XPENDING`]({{< relref "/commands/xpending" >}}) to inspect the per-group pending list with idle times and delivery counts. +* [`XTRIM`]({{< relref "/commands/xtrim" >}}) for explicit retention enforcement. +* [`XGROUP CREATE`]({{< relref "/commands/xgroup-create" >}}) and + [`XGROUP DELCONSUMER`]({{< relref "/commands/xgroup-delconsumer" >}}) to manage groups and consumers. +* [`XINFO STREAM`]({{< relref "/commands/xinfo-stream" >}}), + [`XINFO GROUPS`]({{< relref "/commands/xinfo-groups" >}}), and + [`XINFO CONSUMERS`]({{< relref "/commands/xinfo-consumers" >}}) for observability. + +See the [`redis-rs` documentation](https://docs.rs/redis/) for the full client reference, and the [Streams overview]({{< relref "/develop/data-types/streams" >}}) for the deeper conceptual model — consumer groups, the PEL, claim semantics, capped streams, and the differences with Kafka partitions. diff --git a/content/develop/use-cases/streaming/rust/consumer_worker.rs b/content/develop/use-cases/streaming/rust/consumer_worker.rs new file mode 100644 index 0000000000..d4f1228352 --- /dev/null +++ b/content/develop/use-cases/streaming/rust/consumer_worker.rs @@ -0,0 +1,393 @@ +//! Background consumer task for a single consumer in a consumer group. +//! +//! Each worker owns a tokio task that loops on `XREADGROUP >` with a +//! short block timeout and acks every entry it processes. Recovery of +//! stuck PEL entries (this consumer's, or anyone else's) happens +//! through `reap_idle_pel()`, which is the textbook Streams pattern: +//! each consumer periodically (or on demand) calls `XAUTOCLAIM` with +//! itself as the target, then processes whatever it claimed. The +//! demo's "XAUTOCLAIM to selected" button is exactly that call. +//! +//! Two demo-only levers are wired into the loop: +//! +//! * `pause()` parks the worker (so its pending entries age into the +//! `XAUTOCLAIM` window without being consumed by `>` reads). +//! * `crash_next(n)` tells the worker to drop its next `n` deliveries +//! on the floor without acking them — the same effect as a worker +//! process dying mid-message. Those entries stay in the group's PEL +//! until `reap_idle_pel` recovers them. +//! +//! Real consumers do not need either lever; they only need +//! `XREADGROUP` → process → `XACK` in `_run` and a periodic +//! `reap_idle_pel` call to recover stuck entries. + +use std::collections::{HashMap, VecDeque}; +use std::sync::Arc; +use std::time::Duration; + +use serde::Serialize; +use tokio::sync::Mutex; +use tokio::task::JoinHandle; + +use crate::event_stream::{Entry, EventStream}; + +/// One row in the worker's "recent activity" tail. The demo UI renders +/// it as a small badge stack so you can see whether the entry was +/// acked or dropped on the floor. +#[derive(Debug, Clone, Serialize)] +pub struct RecentEntry { + pub id: String, + #[serde(rename = "type")] + pub event_type: String, + pub fields: HashMap, + pub acked: bool, + pub note: String, +} + +#[derive(Debug, Clone, Default, Serialize)] +pub struct ConsumerStatus { + pub name: String, + pub group: String, + pub processed: u64, + pub reaped: u64, + pub crashed_drops: u64, + pub paused: bool, + pub crash_queued: u64, + pub alive: bool, +} + +/// Result of one `XAUTOCLAIM(self) + process` pass. +#[derive(Debug, Clone, Serialize, Default)] +pub struct ReapResult { + pub claimed: u64, + pub processed: u64, + pub deleted_ids: Vec, +} + +#[derive(Default)] +struct WorkerInner { + recent: VecDeque, + processed: u64, + reaped: u64, + crashed_drops: u64, + crash_next: u64, + paused: bool, + stop: bool, +} + +/// One consumer in a consumer group, running on its own tokio task. +/// +/// Cheap to clone (`Arc<...>`). Hold one per registered consumer in the +/// demo's `(group, name)` registry. +pub struct ConsumerWorker { + stream: EventStream, + pub group: String, + pub name: String, + process_latency: Duration, + recent_capacity: usize, + inner: Mutex, + handle: Mutex>>, +} + +impl ConsumerWorker { + pub fn new( + stream: EventStream, + group: impl Into, + name: impl Into, + ) -> Arc { + Arc::new(Self { + stream, + group: group.into(), + name: name.into(), + process_latency: Duration::from_millis(25), + recent_capacity: 20, + inner: Mutex::new(WorkerInner::default()), + handle: Mutex::new(None), + }) + } + + // ------------------------------------------------------------------ + // Lifecycle + // ------------------------------------------------------------------ + + /// Spawn the read-process-ack loop on a tokio task. Calling + /// `start()` on a worker that's already running is a no-op. + pub async fn start(self: &Arc) { + let mut handle_guard = self.handle.lock().await; + if let Some(h) = handle_guard.as_ref() { + if !h.is_finished() { + return; + } + } + { + let mut inner = self.inner.lock().await; + inner.stop = false; + } + let me = Arc::clone(self); + let handle = tokio::spawn(async move { + me.run_loop().await; + }); + *handle_guard = Some(handle); + } + + /// Set the stop flag and wait briefly for the task to exit. We do + /// not `abort()` the task — letting it exit at the next loop iter + /// lets any in-flight ack complete. + pub async fn stop(&self) { + { + let mut inner = self.inner.lock().await; + inner.stop = true; + } + let mut handle_guard = self.handle.lock().await; + if let Some(h) = handle_guard.as_mut() { + // The read loop blocks for up to 500ms in `consume`; give + // it a beat to wake up and check the stop flag. + let _ = tokio::time::timeout(Duration::from_secs(2), h).await; + } + *handle_guard = None; + } + + // ------------------------------------------------------------------ + // Demo levers + // ------------------------------------------------------------------ + + #[allow(dead_code)] + pub async fn pause(&self) { + let mut inner = self.inner.lock().await; + inner.paused = true; + } + + #[allow(dead_code)] + pub async fn resume(&self) { + let mut inner = self.inner.lock().await; + inner.paused = false; + } + + /// Drop the next `count` deliveries without acking them. + /// + /// The entries stay in the group's PEL with their delivery counter + /// incremented, so `XAUTOCLAIM` can recover them once they exceed + /// the idle threshold. + pub async fn crash_next(&self, count: u64) { + let mut inner = self.inner.lock().await; + inner.crash_next = inner.crash_next.saturating_add(count); + } + + // ------------------------------------------------------------------ + // Introspection + // ------------------------------------------------------------------ + + pub async fn recent(&self) -> Vec { + let inner = self.inner.lock().await; + inner.recent.iter().cloned().collect() + } + + pub async fn status(&self) -> ConsumerStatus { + let inner = self.inner.lock().await; + let alive = { + let handle = self.handle.lock().await; + handle.as_ref().map(|h| !h.is_finished()).unwrap_or(false) + }; + ConsumerStatus { + name: self.name.clone(), + group: self.group.clone(), + processed: inner.processed, + reaped: inner.reaped, + crashed_drops: inner.crashed_drops, + paused: inner.paused, + crash_queued: inner.crash_next, + alive, + } + } + + // ------------------------------------------------------------------ + // Recovery + // ------------------------------------------------------------------ + + /// Run `XAUTOCLAIM` into self and process the claimed entries. + /// + /// Returns a summary with `claimed`, `processed`, and `deleted_ids` + /// counts. Safe to call from any task — the heavy lifting is + /// `EventStream::autoclaim` (a Redis call) plus the per-entry + /// dispatch via `dispatch_locked`. + /// + /// `deleted_ids` are PEL entries whose stream payload was already + /// trimmed by `MAXLEN ~` / `XTRIM` before the sweep ran. Redis 7+ + /// removes them from the PEL inside `XAUTOCLAIM` itself, so the + /// caller does not have to `XACK` them; they are reported so the + /// caller can route them to a dead-letter store. + pub async fn reap_idle_pel(&self) -> ReapResult { + let (claimed, deleted) = match self + .stream + .autoclaim(&self.group, &self.name, 100, "0-0", 10) + .await + { + Ok(v) => v, + Err(err) => { + eprintln!( + "[{}/{}] reap: XAUTOCLAIM failed: {}", + self.group, self.name, err + ); + return ReapResult::default(); + } + }; + let mut processed: u64 = 0; + for (entry_id, fields) in claimed.iter() { + // Reap path: handle each entry inline. Sleep first to match + // the read-loop processing latency, then ack-or-drop in the + // same code path the read loop uses. + tokio::time::sleep(self.process_latency).await; + match self.handle_entry(entry_id.clone(), fields.clone()).await { + Ok(()) => { + processed += 1; + } + Err(err) => { + eprintln!( + "[{}/{}] reap failed on {}: {}", + self.group, self.name, entry_id, err + ); + } + } + } + { + let mut inner = self.inner.lock().await; + inner.reaped = inner.reaped.saturating_add(processed); + } + ReapResult { + claimed: claimed.len() as u64, + processed, + deleted_ids: deleted, + } + } + + // ------------------------------------------------------------------ + // Main loop + // ------------------------------------------------------------------ + + async fn run_loop(self: Arc) { + loop { + // Snapshot the demo-flag state under one short lock to keep + // the hot path lock-free. + let (stop, paused) = { + let inner = self.inner.lock().await; + (inner.stop, inner.paused) + }; + if stop { + return; + } + if paused { + tokio::time::sleep(Duration::from_millis(50)).await; + continue; + } + + let entries: Vec = + match self.stream.consume(&self.group, &self.name, 10, 500).await { + Ok(v) => v, + Err(err) => { + // Don't kill the task on a transient Redis + // error; a real consumer would log this and + // back off. + eprintln!( + "[{}/{}] read failed: {}", + self.group, self.name, err + ); + tokio::time::sleep(Duration::from_millis(500)).await; + continue; + } + }; + + for (entry_id, fields) in entries { + self.dispatch(entry_id, fields).await; + } + } + } + + async fn dispatch(&self, entry_id: String, fields: HashMap) { + if !self.process_latency.is_zero() { + tokio::time::sleep(self.process_latency).await; + } + if let Err(err) = self.handle_entry(entry_id.clone(), fields.clone()).await { + // A failure here (typically XACK against Redis) must not + // kill the spawned task — that would silently halt this + // consumer while every other entry sat in its PEL waiting + // for XAUTOCLAIM. The entry stays unacked; the next + // ``reap_idle_pel`` call (here or on any consumer in the + // group) can recover it once it exceeds the idle threshold. + eprintln!( + "[{}/{}] failed to handle {}: {}", + self.group, self.name, entry_id, err + ); + let event_type = fields.get("type").cloned().unwrap_or_default(); + let entry = RecentEntry { + id: entry_id, + event_type, + fields, + acked: false, + note: format!("handler error: {}", err), + }; + self.push_recent(entry).await; + } + } + + /// Handle one entry. Returns `Err` only if Redis itself failed + /// (so the caller can log and *not* tear the task down). A drop + /// (simulated crash) is success from the task's point of view — + /// the entry stays in the PEL on purpose. + async fn handle_entry( + &self, + entry_id: String, + fields: HashMap, + ) -> Result<(), redis::RedisError> { + // Pull the drop decision under one short lock, then act. + let drop = { + let mut inner = self.inner.lock().await; + if inner.crash_next > 0 { + inner.crash_next -= 1; + true + } else { + false + } + }; + + if drop { + let event_type = fields.get("type").cloned().unwrap_or_default(); + let entry = RecentEntry { + id: entry_id, + event_type, + fields, + acked: false, + note: "dropped (simulated crash)".to_string(), + }; + { + let mut inner = self.inner.lock().await; + inner.crashed_drops = inner.crashed_drops.saturating_add(1); + } + self.push_recent(entry).await; + return Ok(()); + } + + self.stream.ack(&self.group, vec![entry_id.clone()]).await?; + let event_type = fields.get("type").cloned().unwrap_or_default(); + let entry = RecentEntry { + id: entry_id, + event_type, + fields, + acked: true, + note: String::new(), + }; + { + let mut inner = self.inner.lock().await; + inner.processed = inner.processed.saturating_add(1); + } + self.push_recent(entry).await; + Ok(()) + } + + async fn push_recent(&self, entry: RecentEntry) { + let mut inner = self.inner.lock().await; + inner.recent.push_front(entry); + while inner.recent.len() > self.recent_capacity { + inner.recent.pop_back(); + } + } +} diff --git a/content/develop/use-cases/streaming/rust/demo_server.rs b/content/develop/use-cases/streaming/rust/demo_server.rs new file mode 100644 index 0000000000..9dee2c6580 --- /dev/null +++ b/content/develop/use-cases/streaming/rust/demo_server.rs @@ -0,0 +1,1261 @@ +//! Redis streaming demo server. +//! +//! Run this file and visit http://localhost:8788 to watch a Redis Stream +//! in action: producers append events to a single stream, two +//! independent consumer groups read the same stream at their own pace, +//! and within the `notifications` group two consumers share the work. +//! +//! Use the UI to: +//! +//! * Produce events into the stream. +//! * Watch each consumer group's last-delivered ID, PEL count, and the +//! consumers inside it. +//! * Drop the next `N` messages from a chosen consumer to simulate a +//! crash mid-processing, then run `XAUTOCLAIM` to reassign the stuck +//! entries to a healthy consumer. +//! * Replay any ID range with `XRANGE` to confirm the history is +//! independent of consumer-group state. +//! * Trim the stream with `XTRIM` to bound retention. + +mod consumer_worker; +mod event_stream; + +use std::collections::{HashMap, HashSet}; +use std::env; +use std::sync::Arc; + +use axum::{ + extract::{Form, Query, State}, + http::{header, StatusCode}, + response::{IntoResponse, Response}, + routing::{get, post}, + Json, Router, +}; +use rand::seq::SliceRandom; +use rand::Rng; +use redis::aio::ConnectionManager; +use redis::Client; +use serde::Deserialize; +use serde_json::{json, Value}; +use tokio::sync::Mutex; + +use consumer_worker::{ConsumerStatus, ConsumerWorker}; +use event_stream::EventStream; + +const EVENT_TYPES: &[&str] = &[ + "order.placed", + "order.paid", + "order.shipped", + "order.cancelled", +]; +const CUSTOMERS: &[&str] = &["alice", "bob", "carol", "dan", "erin"]; + +fn default_groups() -> Vec<(&'static str, Vec<&'static str>)> { + vec![ + ("notifications", vec!["worker-a", "worker-b"]), + ("analytics", vec!["worker-c"]), + ] +} + +type WorkerKey = (String, String); + +/// In-memory registry of consumer workers across all groups. +/// +/// Every mutation goes through `lock` — the registry mutex is held for +/// the entire check + spawn + insert sequence so two concurrent calls +/// to `add_worker` for the same `(group, name)` cannot both succeed. +/// Audit-checklist row 18 (Concurrent-name reservation race in async +/// helpers): holding a `tokio::sync::Mutex` across `.await` lets the +/// reservation be atomic against any concurrent caller, since +/// `tokio::sync::Mutex` is `Send` (unlike `std::sync::Mutex`, which we +/// could not hold across the `start().await` inside `add_worker`). +struct StreamingDemo { + stream: EventStream, + /// Single mutex guards both the registry and any in-flight + /// reservation. `add_worker` and `remove_worker` hold it for the + /// duration of the spawn / handover / delconsumer sequence. + workers: Mutex>>, +} + +impl StreamingDemo { + fn new(stream: EventStream) -> Arc { + Arc::new(Self { + stream, + workers: Mutex::new(HashMap::new()), + }) + } + + async fn seed(self: &Arc, groups: &[(&str, Vec<&str>)]) -> usize { + let mut total = 0; + for (group, names) in groups { + // Ignore errors — duplicate-create returns BUSYGROUP which + // `ensure_group` already swallows, and a connection error + // here will surface again on the first XADD. + let _ = self.stream.ensure_group(group, "0-0").await; + for name in names { + let added = self.add_worker(group, name).await; + if added { + total += 1; + } + } + } + total + } + + /// Atomically reserve `(group, name)` and start the worker. + /// + /// Returns `true` on success, `false` if a worker with that name + /// already exists. The registry mutex is held across the + /// `worker.start().await` so two concurrent calls with the same + /// name cannot both insert. + async fn add_worker(self: &Arc, group: &str, name: &str) -> bool { + let key: WorkerKey = (group.to_string(), name.to_string()); + let mut guard = self.workers.lock().await; + if guard.contains_key(&key) { + return false; + } + // ensure_group is idempotent; cheap to call from here. + let _ = self.stream.ensure_group(group, "0-0").await; + let worker = ConsumerWorker::new(self.stream.clone(), group, name); + worker.start().await; + guard.insert(key, worker); + true + } + + /// Remove a consumer safely. + /// + /// `XGROUP DELCONSUMER` destroys the consumer's PEL entries + /// outright, so any pending message it still owned would become + /// unreachable. Before deleting, hand its PEL off to another + /// consumer in the same group with `XCLAIM`. Without a peer + /// consumer to take over, refuse to delete and leave the worker in + /// place so the user can add a peer first. + async fn remove_worker(self: &Arc, group: &str, name: &str) -> RemoveResult { + let key: WorkerKey = (group.to_string(), name.to_string()); + // Find the worker and a peer under the lock but DO NOT remove + // yet. We release the lock for the (potentially slow) handover + // so /state polls don't queue behind it, but the worker stays + // in the registry so a failed handover doesn't strand a + // half-removed consumer. XGROUP DELCONSUMER destroys the + // source's PEL — only run it after handover has succeeded. + let (worker, peer) = { + let guard = self.workers.lock().await; + let Some(worker) = guard.get(&key).cloned() else { + return RemoveResult::not_found(); + }; + let peer: Option = guard + .keys() + .find(|(g, n)| g == group && n != name) + .map(|(_, n)| n.clone()); + let Some(peer) = peer else { + return RemoveResult::no_peer(group, name); + }; + (worker, peer) + }; + + let handed_over = match self + .stream + .handover_pending(group, name, &peer, 100) + .await + { + Ok(n) => n, + Err(err) => { + return RemoveResult::handover_failed(group, name, &peer, &err.to_string()); + } + }; + + // Handover succeeded; now safe to remove from the registry, + // stop the worker, and destroy the consumer record in Redis. + { + let mut guard = self.workers.lock().await; + guard.remove(&key); + } + worker.stop().await; + let _ = self.stream.delete_consumer(group, name).await; + RemoveResult::removed(&peer, handed_over) + } + + async fn get_worker(&self, group: &str, name: &str) -> Option> { + let guard = self.workers.lock().await; + guard + .get(&(group.to_string(), name.to_string())) + .cloned() + } + + async fn snapshot(&self) -> Vec<(WorkerKey, Arc)> { + let guard = self.workers.lock().await; + guard + .iter() + .map(|(k, v)| (k.clone(), v.clone())) + .collect() + } + + async fn stop_all(&self) { + let workers: Vec> = { + let mut guard = self.workers.lock().await; + let drained: Vec> = guard.values().cloned().collect(); + guard.clear(); + drained + }; + for worker in workers { + worker.stop().await; + } + } + + async fn reset(self: &Arc) -> usize { + self.stop_all().await; + self.stream.delete_stream().await; + self.stream.reset_stats(); + let groups = default_groups(); + let groups_ref: Vec<(&str, Vec<&str>)> = groups + .iter() + .map(|(g, n)| (*g, n.clone())) + .collect(); + self.seed(&groups_ref).await + } +} + +#[derive(Debug, Clone, serde::Serialize)] +struct RemoveResult { + removed: bool, + #[serde(skip_serializing_if = "Option::is_none")] + reason: Option, + #[serde(skip_serializing_if = "Option::is_none")] + message: Option, + #[serde(skip_serializing_if = "Option::is_none")] + handed_over_to: Option, + #[serde(skip_serializing_if = "Option::is_none")] + handed_over_count: Option, +} + +impl RemoveResult { + fn not_found() -> Self { + Self { + removed: false, + reason: Some("not-found".to_string()), + message: None, + handed_over_to: None, + handed_over_count: None, + } + } + + fn no_peer(group: &str, name: &str) -> Self { + Self { + removed: false, + reason: Some("no-peer".to_string()), + message: Some(format!( + "{group}/{name} still owns pending entries and is the only \ + consumer in its group; add another consumer first so its \ + PEL can be handed over before deletion." + )), + handed_over_to: None, + handed_over_count: None, + } + } + + fn removed(peer: &str, count: usize) -> Self { + Self { + removed: true, + reason: None, + message: None, + handed_over_to: Some(peer.to_string()), + handed_over_count: Some(count), + } + } + + fn handover_failed(group: &str, name: &str, peer: &str, err: &str) -> Self { + Self { + removed: false, + reason: Some("handover-failed".to_string()), + message: Some(format!( + "Handover from {group}/{name} to {peer} failed before XGROUP DELCONSUMER \ + could run: {err}. {group}/{name} is still in the group; retry the remove \ + or investigate the Redis error before deleting (DELCONSUMER would destroy \ + the source consumer's pending entries)." + )), + handed_over_to: None, + handed_over_count: None, + } + } +} + +#[derive(Clone)] +struct AppState { + stream: EventStream, + demo: Arc, + maxlen_approx: usize, + claim_idle_ms: u64, +} + +#[tokio::main] +async fn main() { + let mut host = String::from("127.0.0.1"); + let mut port: u16 = 8788; + let mut redis_host = env::var("REDIS_HOST").unwrap_or_else(|_| "localhost".to_string()); + let mut redis_port: u16 = env::var("REDIS_PORT") + .ok() + .and_then(|s| s.parse().ok()) + .unwrap_or(6379); + let mut stream_key = String::from("demo:events:orders"); + let mut maxlen: usize = 2000; + let mut claim_idle_ms: u64 = 5000; + let mut reset_on_start = true; + + let args: Vec = env::args().collect(); + let mut i = 1; + while i < args.len() { + match args[i].as_str() { + "--host" if i + 1 < args.len() => { + host = args[i + 1].clone(); + i += 2; + } + "--port" if i + 1 < args.len() => { + port = args[i + 1].parse().expect("invalid --port"); + i += 2; + } + "--redis-host" if i + 1 < args.len() => { + redis_host = args[i + 1].clone(); + i += 2; + } + "--redis-port" if i + 1 < args.len() => { + redis_port = args[i + 1].parse().expect("invalid --redis-port"); + i += 2; + } + "--stream-key" if i + 1 < args.len() => { + stream_key = args[i + 1].clone(); + i += 2; + } + "--maxlen" if i + 1 < args.len() => { + maxlen = args[i + 1].parse().expect("invalid --maxlen"); + i += 2; + } + "--claim-idle-ms" if i + 1 < args.len() => { + claim_idle_ms = args[i + 1].parse().expect("invalid --claim-idle-ms"); + i += 2; + } + "--no-reset" => { + reset_on_start = false; + i += 1; + } + _ => { + i += 1; + } + } + } + + let url = format!("redis://{}:{}/", redis_host, redis_port); + let client = Client::open(url).expect("failed to create Redis client"); + let conn = ConnectionManager::new(client) + .await + .expect("failed to connect to Redis"); + + let stream = EventStream::new(conn, stream_key.clone(), maxlen, claim_idle_ms); + let demo = StreamingDemo::new(stream.clone()); + + if reset_on_start { + println!( + "Deleting any existing data at key '{}' for a clean demo run \ + (pass --no-reset to keep it).", + stream_key + ); + stream.delete_stream().await; + } + + let groups = default_groups(); + let groups_ref: Vec<(&str, Vec<&str>)> = + groups.iter().map(|(g, n)| (*g, n.clone())).collect(); + let seeded = demo.seed(&groups_ref).await; + + println!( + "Redis streaming demo server listening on http://{}:{}", + host, port + ); + println!( + "Using Redis at {}:{} with stream key '{}' (MAXLEN ~ {})", + redis_host, redis_port, stream_key, maxlen + ); + println!( + "Seeded {} consumer(s) across {} group(s)", + seeded, + groups.len() + ); + + let state = AppState { + stream, + demo: demo.clone(), + maxlen_approx: maxlen, + claim_idle_ms, + }; + + let app = Router::new() + .route("/", get(index)) + .route("/state", get(state_handler)) + .route("/replay", get(replay)) + .route("/produce", post(produce)) + .route("/add-worker", post(add_worker)) + .route("/remove-worker", post(remove_worker)) + .route("/crash", post(crash)) + .route("/autoclaim", post(autoclaim)) + .route("/trim", post(trim)) + .route("/reset", post(reset)) + .with_state(state); + + let listener = tokio::net::TcpListener::bind((host.as_str(), port)) + .await + .expect("failed to bind"); + let serve = axum::serve(listener, app); + if let Err(err) = serve.await { + eprintln!("server error: {}", err); + } + demo.stop_all().await; +} + +async fn index(State(state): State) -> Response { + let html = render_html_page( + &state.stream.stream_key, + state.maxlen_approx, + state.claim_idle_ms, + ); + ( + [(header::CONTENT_TYPE, "text/html; charset=utf-8")], + html, + ) + .into_response() +} + +async fn state_handler(State(state): State) -> Response { + Json(build_state(&state).await).into_response() +} + +#[derive(Deserialize)] +struct ProduceForm { + count: Option, + #[serde(rename = "type")] + event_type: Option, +} + +async fn produce(State(state): State, Form(form): Form) -> Response { + let count_raw: i64 = form + .count + .as_deref() + .and_then(|s| s.parse().ok()) + .unwrap_or(1); + let count = count_raw.clamp(1, 500) as usize; + let event_type = form.event_type.unwrap_or_default(); + let event_type = event_type.trim(); + + let mut events: Vec<(String, HashMap)> = Vec::with_capacity(count); + for _ in 0..count { + let picked = if event_type.is_empty() { + random_event_type() + } else { + event_type.to_string() + }; + events.push((picked, fake_payload())); + } + let ids = match state.stream.produce_batch(events).await { + Ok(ids) => ids, + Err(err) => { + return error_json(StatusCode::INTERNAL_SERVER_ERROR, &err.to_string()); + } + }; + Json(json!({ + "produced": ids.len(), + "ids": ids, + })) + .into_response() +} + +#[derive(Deserialize)] +struct AddWorkerForm { + group: String, + name: String, +} + +async fn add_worker( + State(state): State, + Form(form): Form, +) -> Response { + let group = form.group.trim(); + let name = form.name.trim(); + if group.is_empty() || name.is_empty() { + return error_json(StatusCode::BAD_REQUEST, "group and name are required"); + } + let added = state.demo.add_worker(group, name).await; + if !added { + return error_json( + StatusCode::CONFLICT, + &format!("{group}/{name} already exists"), + ); + } + Json(json!({ "group": group, "name": name })).into_response() +} + +#[derive(Deserialize)] +struct RemoveWorkerForm { + group: String, + name: String, +} + +async fn remove_worker( + State(state): State, + Form(form): Form, +) -> Response { + let group = form.group.trim(); + let name = form.name.trim(); + let result = state.demo.remove_worker(group, name).await; + let status = if result.removed || result.reason.as_deref() == Some("not-found") { + StatusCode::OK + } else { + StatusCode::CONFLICT + }; + (status, Json(result)).into_response() +} + +#[derive(Deserialize)] +struct CrashForm { + group: String, + name: String, + count: Option, +} + +async fn crash(State(state): State, Form(form): Form) -> Response { + let group = form.group.trim(); + let name = form.name.trim(); + let count: u64 = form + .count + .as_deref() + .and_then(|s| s.parse().ok()) + .unwrap_or(1); + let Some(worker) = state.demo.get_worker(group, name).await else { + return error_json( + StatusCode::NOT_FOUND, + &format!("unknown consumer {group}/{name}"), + ); + }; + worker.crash_next(count).await; + Json(json!({ "queued": count })).into_response() +} + +#[derive(Deserialize)] +struct AutoclaimForm { + group: String, + consumer: String, +} + +async fn autoclaim( + State(state): State, + Form(form): Form, +) -> Response { + let group = form.group.trim(); + let consumer = form.consumer.trim(); + if group.is_empty() || consumer.is_empty() { + return error_json( + StatusCode::BAD_REQUEST, + "group and consumer are required", + ); + } + let Some(worker) = state.demo.get_worker(group, consumer).await else { + return error_json( + StatusCode::NOT_FOUND, + &format!("unknown consumer {group}/{consumer}"), + ); + }; + // ``reap_idle_pel`` runs XAUTOCLAIM(self) + process + ack. The + // ``deleted`` list contains PEL entries whose stream payload was + // already trimmed by ``MAXLEN ~`` before the sweep ran. Redis 7+ + // removes them from the PEL inside XAUTOCLAIM itself, so the caller + // doesn't have to XACK them; in production they'd be routed to a + // dead-letter store for offline inspection. + let result = worker.reap_idle_pel().await; + Json(json!({ + "claimed": result.claimed, + "processed": result.processed, + "deleted": result.deleted_ids, + "min_idle_ms": state.claim_idle_ms, + })) + .into_response() +} + +#[derive(Deserialize)] +struct TrimForm { + maxlen: Option, +} + +async fn trim(State(state): State, Form(form): Form) -> Response { + let maxlen_raw: i64 = form + .maxlen + .as_deref() + .and_then(|s| s.parse().ok()) + .unwrap_or(0); + let maxlen = maxlen_raw.max(0) as usize; + let deleted = state.stream.trim_maxlen(maxlen).await.unwrap_or(0); + Json(json!({ "deleted": deleted, "maxlen": maxlen })).into_response() +} + +#[derive(Deserialize)] +struct ReplayParams { + start: Option, + end: Option, + count: Option, +} + +async fn replay( + State(state): State, + Query(params): Query, +) -> Response { + let start = params.start.unwrap_or_else(|| "-".to_string()); + let end = params.end.unwrap_or_else(|| "+".to_string()); + let limit_raw: i64 = params + .count + .as_deref() + .and_then(|s| s.parse().ok()) + .unwrap_or(20); + let limit = limit_raw.clamp(1, 500) as usize; + let entries = state.stream.replay(&start, &end, limit).await.unwrap_or_default(); + let payload: Vec = entries + .into_iter() + .map(|(id, fields)| json!({ "id": id, "fields": fields })) + .collect(); + Json(json!({ + "start": start, + "end": end, + "limit": limit, + "entries": payload, + })) + .into_response() +} + +async fn reset(State(state): State) -> Response { + let count = state.demo.reset().await; + Json(json!({ "consumers": count })).into_response() +} + +async fn build_state(state: &AppState) -> Value { + let stream_info = state.stream.info_stream().await; + let groups = state.stream.info_groups().await; + + // Workers snapshot taken once per /state call so concurrent + // add/remove requests can't change it mid-loop. + let workers = state.demo.snapshot().await; + + let mut groups_detail: Vec = Vec::with_capacity(groups.len()); + let mut pending_rows: Vec = Vec::new(); + + for group in groups { + let consumer_info = state.stream.info_consumers(&group.name).await; + let info_by_name: HashMap = + consumer_info.iter().map(|c| (c.name.clone(), c)).collect(); + + let mut consumers_detail: Vec = Vec::new(); + let mut seen_names: HashSet = HashSet::new(); + for ((g_name, c_name), worker) in workers.iter() { + if g_name != &group.name { + continue; + } + let info = info_by_name.get(c_name).copied(); + let status: ConsumerStatus = worker.status().await; + let recent = worker.recent().await; + consumers_detail.push(consumer_detail_json(&status, info, recent)); + seen_names.insert(c_name.clone()); + } + // Also include consumers that exist in Redis but not in our + // in-process registry (e.g. orphaned after a restart). + for c in consumer_info.iter() { + if seen_names.contains(&c.name) { + continue; + } + consumers_detail.push(json!({ + "name": c.name, + "group": group.name, + "processed": 0, + "reaped": 0, + "crashed_drops": 0, + "paused": false, + "crash_queued": 0, + "alive": false, + "pending": c.pending, + "idle_ms": c.idle_ms, + "recent": Vec::::new(), + })); + } + consumers_detail.sort_by(|a, b| { + a.get("name") + .and_then(|v| v.as_str()) + .unwrap_or("") + .cmp(b.get("name").and_then(|v| v.as_str()).unwrap_or("")) + }); + + groups_detail.push(json!({ + "name": group.name, + "consumers": group.consumers, + "pending": group.pending, + "last_delivered_id": group.last_delivered_id, + "lag": group.lag, + "consumers_detail": consumers_detail, + })); + + for row in state.stream.pending_detail(&group.name, 50).await { + pending_rows.push(json!({ + "id": row.id, + "consumer": row.consumer, + "idle_ms": row.idle_ms, + "deliveries": row.deliveries, + "group": group.name, + })); + } + } + + let tail_entries = state.stream.tail(10).await.unwrap_or_default(); + let tail: Vec = tail_entries + .into_iter() + .map(|(id, fields)| json!({ "id": id, "fields": fields })) + .collect(); + + let stats = state.stream.stats(); + + json!({ + "stream": { + "length": stream_info.length, + "last_generated_id": stream_info.last_generated_id, + "first_entry_id": stream_info.first_entry_id, + "last_entry_id": stream_info.last_entry_id, + }, + "tail": tail, + "groups": groups_detail, + "pending": pending_rows, + "stats": { + "produced_total": stats.produced_total, + "acked_total": stats.acked_total, + "claimed_total": stats.claimed_total, + }, + }) +} + +fn consumer_detail_json( + status: &ConsumerStatus, + info: Option<&event_stream::ConsumerInfo>, + recent: Vec, +) -> Value { + let (pending, idle_ms) = match info { + Some(c) => (c.pending, c.idle_ms), + None => (0, 0), + }; + json!({ + "name": status.name, + "group": status.group, + "processed": status.processed, + "reaped": status.reaped, + "crashed_drops": status.crashed_drops, + "paused": status.paused, + "crash_queued": status.crash_queued, + "alive": status.alive, + "pending": pending, + "idle_ms": idle_ms, + "recent": recent, + }) +} + +fn error_json(status: StatusCode, message: &str) -> Response { + (status, Json(json!({ "error": message }))).into_response() +} + +fn random_event_type() -> String { + let mut rng = rand::thread_rng(); + EVENT_TYPES.choose(&mut rng).unwrap_or(&EVENT_TYPES[0]).to_string() +} + +fn fake_payload() -> HashMap { + let mut rng = rand::thread_rng(); + let mut m: HashMap = HashMap::new(); + m.insert("order_id".to_string(), format!("o-{}", rng.gen_range(1000..10_000))); + m.insert( + "customer".to_string(), + CUSTOMERS.choose(&mut rng).unwrap_or(&"alice").to_string(), + ); + m.insert( + "amount".to_string(), + format!("{:.2}", rng.gen_range(5.0_f64..250.0_f64)), + ); + m +} + +fn render_html_page(stream_key: &str, maxlen: usize, claim_idle_ms: u64) -> String { + HTML_TEMPLATE + .replace("__STREAM_KEY__", stream_key) + .replace("__MAXLEN__", &maxlen.to_string()) + .replace("__CLAIM_IDLE__", &claim_idle_ms.to_string()) +} + +const HTML_TEMPLATE: &str = r##" + + + + + Redis Streaming Demo + + + +
+
redis-rs + axum
+

Redis Streaming Demo

+

+ Producers append events to a single Redis Stream + (__STREAM_KEY__). Two consumer groups read the same + stream independently: notifications shares its work + across two consumers, analytics processes the full + flow on its own. Acknowledge with XACK, recover + crashed deliveries with XAUTOCLAIM, replay any range + with XRANGE, and bound retention with XTRIM. +

+ +
+
+

Stream state

+
Loading...
+ + +
+ +
+

Produce events

+

Events are appended with XADD with an approximate + MAXLEN ~ __MAXLEN__ retention cap.

+ + + + + +
+ +
+

Replay range (XRANGE)

+

Reads a slice of history. Replay is independent of any + consumer group — no cursors move, no acks happen.

+ + + + + + + +
+ +
+

Trim retention (XTRIM)

+

Cap the stream length. Approximate trimming releases whole + macro-nodes, which is much cheaper than exact trimming.

+ + + +
+ +
+

Consumer groups

+
Loading...
+
+ +
+

Pending entries (XPENDING)

+

Entries delivered to a consumer that haven't been acked yet. + Idle time ≥ __CLAIM_IDLE__ ms is eligible for + XAUTOCLAIM.

+
Loading...
+
+ + +
+
+ +
+

Last result

+

Produce events, replay a range, or trigger an autoclaim to see results.

+
+
+ +
+
+ + + + +"##; diff --git a/content/develop/use-cases/streaming/rust/event_stream.rs b/content/develop/use-cases/streaming/rust/event_stream.rs new file mode 100644 index 0000000000..77f55ccada --- /dev/null +++ b/content/develop/use-cases/streaming/rust/event_stream.rs @@ -0,0 +1,736 @@ +//! Redis event-stream helper backed by a single Redis Stream. +//! +//! Producers append events with `XADD`. Consumers belong to consumer +//! groups and read with `XREADGROUP`. The group as a whole tracks a +//! single `last-delivered-id` cursor, and each consumer gets its own +//! pending-entries list (PEL) of in-flight messages it has been +//! handed. Once a consumer has processed an entry it acknowledges it +//! with `XACK`; entries left unacknowledged past an idle threshold can +//! be swept to a healthy consumer with `XAUTOCLAIM` (or to a specific +//! one with `XCLAIM`). +//! +//! Each `XADD` carries an approximate `MAXLEN` so the stream stays +//! bounded as it rolls forward. `XRANGE` supports replay over the +//! retained history for debugging, audit, or rebuilding a downstream +//! projection. Note that approximate trimming can release entries that +//! are still in a group's PEL: those entries appear in `XAUTOCLAIM`'s +//! deleted-IDs list, which the caller should log and route to a +//! dead-letter store. Redis 7+ removes them from the PEL inside the +//! `XAUTOCLAIM` call itself, so no explicit `XACK` is needed. +//! +//! The same stream can be read by any number of consumer groups — each +//! group has its own cursor and its own pending lists, so analytics, +//! notifications, and audit can all process the full event flow at +//! their own pace without coordinating with each other. + +use std::collections::HashMap; +use std::sync::atomic::{AtomicU64, Ordering}; +use std::sync::Arc; +use std::time::{SystemTime, UNIX_EPOCH}; + +use redis::aio::ConnectionManager; +use redis::streams::{ + StreamMaxlen, StreamPendingCountReply, StreamRangeReply, StreamReadOptions, StreamReadReply, +}; +use redis::{AsyncCommands, RedisError, RedisResult, Value}; + +/// A single stream entry: `(id, field/value map)`. +pub type Entry = (String, HashMap); + +/// One pending-entry row from `XPENDING ... `. +#[derive(Debug, Clone)] +pub struct PendingEntry { + pub id: String, + pub consumer: String, + pub idle_ms: u64, + pub deliveries: u64, +} + +/// Cached snapshot of `XINFO STREAM` for the demo UI. +#[derive(Debug, Clone, Default)] +pub struct StreamInfo { + pub length: u64, + pub last_generated_id: Option, + pub first_entry_id: Option, + pub last_entry_id: Option, +} + +/// Per-group info from `XINFO GROUPS`. +#[derive(Debug, Clone, Default)] +pub struct GroupInfo { + pub name: String, + pub consumers: u64, + pub pending: u64, + pub last_delivered_id: String, + /// `lag` is only reported when the group is fully caught up with + /// stream additions; older Redis versions and certain edge cases + /// leave it absent, so it is optional here. + pub lag: Option, +} + +/// Per-consumer info from `XINFO CONSUMERS `. +#[derive(Debug, Clone, Default)] +pub struct ConsumerInfo { + pub name: String, + pub pending: u64, + pub idle_ms: u64, +} + +#[derive(Default)] +struct EventStreamStats { + produced_total: AtomicU64, + acked_total: AtomicU64, + claimed_total: AtomicU64, +} + +/// Producer/consumer helper for a single Redis Stream with consumer +/// groups. +/// +/// Holds a cloneable `ConnectionManager` plus three counters. Every +/// helper method is `async` and takes `&self`; clone the struct cheaply +/// to share it across tasks. +#[derive(Clone)] +pub struct EventStream { + conn: ConnectionManager, + pub stream_key: String, + pub maxlen_approx: usize, + pub claim_min_idle_ms: u64, + stats: Arc, +} + +impl EventStream { + pub fn new( + conn: ConnectionManager, + stream_key: impl Into, + maxlen_approx: usize, + claim_min_idle_ms: u64, + ) -> Self { + Self { + conn, + stream_key: stream_key.into(), + maxlen_approx, + claim_min_idle_ms, + stats: Arc::new(EventStreamStats::default()), + } + } + + // ------------------------------------------------------------------ + // Producer + // ------------------------------------------------------------------ + + /// Append a single event. Returns the stream ID Redis assigned. + #[allow(dead_code)] + pub async fn produce( + &self, + event_type: &str, + payload: HashMap, + ) -> RedisResult { + let mut ids = self.produce_batch(vec![(event_type.to_string(), payload)]).await?; + // produce_batch always returns one id per input event. + Ok(ids.pop().unwrap_or_default()) + } + + /// Pipeline several `XADD` calls in one round trip. + /// + /// Each entry carries an approximate `MAXLEN` cap. The `~` flavour + /// lets Redis trim at a macro-node boundary, which is much cheaper + /// than exact trimming and is the right call for a retention + /// guardrail rather than a hard size limit. + pub async fn produce_batch( + &self, + events: Vec<(String, HashMap)>, + ) -> RedisResult> { + if events.is_empty() { + return Ok(Vec::new()); + } + let mut pipe = redis::pipe(); + for (event_type, payload) in &events { + let fields = encode_fields(event_type, payload); + // redis-rs xadd_maxlen takes ownership of the maxlen flavour, + // so we have to build the (key, value) pairs as Vec<(String, String)>. + let pairs: Vec<(String, String)> = + fields.into_iter().collect(); + pipe.cmd("XADD") + .arg(&self.stream_key) + .arg("MAXLEN") + .arg("~") + .arg(self.maxlen_approx) + .arg("*"); + for (k, v) in &pairs { + pipe.arg(k).arg(v); + } + } + let mut conn = self.conn.clone(); + let ids: Vec = pipe.query_async(&mut conn).await?; + self.stats + .produced_total + .fetch_add(ids.len() as u64, Ordering::Relaxed); + Ok(ids) + } + + // ------------------------------------------------------------------ + // Consumer groups + // ------------------------------------------------------------------ + + /// Create the consumer group if it doesn't exist. + /// + /// `$` means "deliver only events appended after this point"; pass + /// `0-0` to replay the entire stream into a fresh group. `BUSYGROUP` + /// errors are swallowed so this method is idempotent. + pub async fn ensure_group(&self, group: &str, start_id: &str) -> RedisResult<()> { + let mut conn = self.conn.clone(); + let res: RedisResult<()> = redis::cmd("XGROUP") + .arg("CREATE") + .arg(&self.stream_key) + .arg(group) + .arg(start_id) + .arg("MKSTREAM") + .query_async(&mut conn) + .await; + match res { + Ok(()) => Ok(()), + Err(err) if is_busygroup(&err) => Ok(()), + Err(err) => Err(err), + } + } + + /// Read new entries for this consumer via `XREADGROUP`. + /// + /// The `>` ID means "deliver entries this consumer group has not + /// delivered to anyone yet" — that is the at-least-once path. + /// Replaying an explicit ID instead would re-deliver an entry that + /// is already in this consumer's pending list (see + /// `consume_own_pel` for that recovery path). + pub async fn consume( + &self, + group: &str, + consumer: &str, + count: usize, + block_ms: usize, + ) -> RedisResult> { + let opts = StreamReadOptions::default() + .group(group, consumer) + .count(count) + .block(block_ms); + let mut conn = self.conn.clone(); + let reply: Option = conn + .xread_options(&[self.stream_key.as_str()], &[">"], &opts) + .await?; + Ok(flatten_read_reply(reply)) + } + + /// Re-deliver entries already in this consumer's PEL. + /// + /// Reading with an explicit ID (`0` here) instead of `>` replays + /// the entries already assigned to this consumer name without + /// advancing the group's `last-delivered-id`. This is the + /// canonical recovery path after a crash on the same consumer name. + #[allow(dead_code)] + pub async fn consume_own_pel( + &self, + group: &str, + consumer: &str, + count: usize, + ) -> RedisResult> { + let opts = StreamReadOptions::default() + .group(group, consumer) + .count(count); + let mut conn = self.conn.clone(); + let reply: Option = conn + .xread_options(&[self.stream_key.as_str()], &["0"], &opts) + .await?; + Ok(flatten_read_reply(reply)) + } + + /// `XACK` a batch of entry IDs. Returns the number actually acked. + pub async fn ack(&self, group: &str, ids: Vec) -> RedisResult { + if ids.is_empty() { + return Ok(0); + } + let mut conn = self.conn.clone(); + let n: i64 = conn.xack(&self.stream_key, group, &ids).await?; + self.stats.acked_total.fetch_add(n as u64, Ordering::Relaxed); + Ok(n) + } + + /// Sweep idle pending entries to `consumer`. + /// + /// A single `XAUTOCLAIM` call scans up to `page_count` PEL entries + /// starting at `start_id` and returns a continuation cursor. For a + /// full sweep of the PEL, loop until the cursor returns to `0-0` + /// (or hit `max_pages` as a safety net so a very large PEL can't + /// monopolise the call). + /// + /// Returns `(claimed, deleted_ids)`. `deleted_ids` are PEL entries + /// whose stream payload had already been trimmed by the time this + /// sweep ran (typically because `MAXLEN ~` retention outran a slow + /// consumer). `XAUTOCLAIM` removes those dangling slots from the + /// PEL itself — the caller does not need to `XACK` them — but they + /// cannot be retried, so log and route them to a dead-letter store + /// for observability. + pub async fn autoclaim( + &self, + group: &str, + consumer: &str, + page_count: usize, + start_id: &str, + max_pages: usize, + ) -> RedisResult<(Vec, Vec)> { + let mut claimed_all: Vec = Vec::new(); + let mut deleted_all: Vec = Vec::new(); + let mut cursor = start_id.to_string(); + let mut conn = self.conn.clone(); + for _ in 0..max_pages { + // XAUTOCLAIM [COUNT count] + // Reply (Redis 7+): [ next-cursor, [ [id, [field, value, ...]], ... ], [deleted-id, ...] ] + // + // redis-rs 0.24 has no typed wrapper for XAUTOCLAIM; we + // build it by hand and decode the three-element reply into + // (cursor, entries, deleted). The Vec::from_redis_value + // implementation is exposed via redis::FromRedisValue. + let raw: Value = redis::cmd("XAUTOCLAIM") + .arg(&self.stream_key) + .arg(group) + .arg(consumer) + .arg(self.claim_min_idle_ms) + .arg(&cursor) + .arg("COUNT") + .arg(page_count) + .query_async(&mut conn) + .await?; + let (next_cursor, claimed, deleted) = parse_autoclaim_reply(raw)?; + claimed_all.extend(claimed); + deleted_all.extend(deleted); + if next_cursor == "0-0" { + break; + } + cursor = next_cursor; + } + self.stats + .claimed_total + .fetch_add(claimed_all.len() as u64, Ordering::Relaxed); + Ok((claimed_all, deleted_all)) + } + + /// `XGROUP DELCONSUMER` — destroys the consumer's PEL entries + /// outright. Always call `handover_pending` first if the source + /// still owns entries. + pub async fn delete_consumer(&self, group: &str, consumer: &str) -> RedisResult { + let mut conn = self.conn.clone(); + let res: RedisResult = redis::cmd("XGROUP") + .arg("DELCONSUMER") + .arg(&self.stream_key) + .arg(group) + .arg(consumer) + .query_async(&mut conn) + .await; + match res { + Ok(n) => Ok(n), + Err(_) => Ok(0), + } + } + + /// Move every PEL entry owned by `from_consumer` to `to_consumer`. + /// + /// Enumerates the source consumer's PEL with `XPENDING ... CONSUMER` + /// and reassigns each ID with `XCLAIM` at zero idle time so the + /// move is unconditional. (`XAUTOCLAIM` does not filter by source + /// consumer, so it cannot be used for a per-consumer handover.) + /// Returns the number of entries that were actually moved. + pub async fn handover_pending( + &self, + group: &str, + from_consumer: &str, + to_consumer: &str, + batch: usize, + ) -> RedisResult { + let mut conn = self.conn.clone(); + let mut moved: usize = 0; + loop { + let reply: StreamPendingCountReply = redis::cmd("XPENDING") + .arg(&self.stream_key) + .arg(group) + .arg("-") + .arg("+") + .arg(batch) + .arg(from_consumer) + .query_async(&mut conn) + .await?; + if reply.ids.is_empty() { + break; + } + let ids: Vec = reply.ids.iter().map(|row| row.id.clone()).collect(); + // XCLAIM id [id ...] + // We don't care about the parsed reply shape here; JUSTID + // avoids decoding payloads for entries the demo isn't going + // to process inline. + let claimed_ids: Vec = redis::cmd("XCLAIM") + .arg(&self.stream_key) + .arg(group) + .arg(to_consumer) + .arg(0) + .arg(&ids) + .arg("JUSTID") + .query_async(&mut conn) + .await?; + moved += claimed_ids.len(); + if reply.ids.len() < batch { + break; + } + } + self.stats + .claimed_total + .fetch_add(moved as u64, Ordering::Relaxed); + Ok(moved) + } + + // ------------------------------------------------------------------ + // Replay, length, trim + // ------------------------------------------------------------------ + + /// Range read with `XRANGE` for replay or audit. + /// + /// Read-only: ranges do not update any group cursor and do not ack + /// anything. Useful for bootstrapping a new projection, for building + /// an audit view, or for debugging what actually went through the + /// stream. + pub async fn replay( + &self, + start_id: &str, + end_id: &str, + count: usize, + ) -> RedisResult> { + let mut conn = self.conn.clone(); + let reply: StreamRangeReply = conn + .xrange_count(&self.stream_key, start_id, end_id, count) + .await?; + Ok(stream_ids_to_entries(reply.ids)) + } + + /// Newest-first range, used for the demo's "tail" view. + pub async fn tail(&self, count: usize) -> RedisResult> { + let mut conn = self.conn.clone(); + let reply: StreamRangeReply = conn + .xrevrange_count(&self.stream_key, "+", "-", count) + .await?; + Ok(stream_ids_to_entries(reply.ids)) + } + + #[allow(dead_code)] + pub async fn length(&self) -> RedisResult { + let mut conn = self.conn.clone(); + Ok(conn.xlen(&self.stream_key).await?) + } + + pub async fn trim_maxlen(&self, maxlen: usize) -> RedisResult { + let mut conn = self.conn.clone(); + Ok(conn.xtrim(&self.stream_key, StreamMaxlen::Approx(maxlen)).await?) + } + + #[allow(dead_code)] + pub async fn trim_minid(&self, minid: &str) -> RedisResult { + let mut conn = self.conn.clone(); + Ok(redis::cmd("XTRIM") + .arg(&self.stream_key) + .arg("MINID") + .arg("~") + .arg(minid) + .query_async(&mut conn) + .await + .unwrap_or(0)) + } + + // ------------------------------------------------------------------ + // Inspection + // ------------------------------------------------------------------ + + /// Subset of `XINFO STREAM` that's safe to JSON-encode. + pub async fn info_stream(&self) -> StreamInfo { + let mut conn = self.conn.clone(); + // The typed `StreamInfoStreamReply` would work for length and + // last_generated_id, but it parses the radix-tree fields as + // usize and panics on streams Redis returns without them. We + // decode `XINFO STREAM` by hand into a plain `HashMap` and read out only the fields the demo needs. + let raw: Value = match redis::cmd("XINFO") + .arg("STREAM") + .arg(&self.stream_key) + .query_async(&mut conn) + .await + { + Ok(v) => v, + Err(_) => return StreamInfo::default(), + }; + let map: HashMap = match redis::FromRedisValue::from_redis_value(&raw) { + Ok(m) => m, + Err(_) => return StreamInfo::default(), + }; + let length: u64 = map + .get("length") + .and_then(|v| redis::FromRedisValue::from_redis_value(v).ok()) + .unwrap_or(0); + let last_generated_id: Option = map + .get("last-generated-id") + .and_then(|v| redis::FromRedisValue::from_redis_value(v).ok()); + let first_entry_id = entry_id_from_value(map.get("first-entry")); + let last_entry_id = entry_id_from_value(map.get("last-entry")); + StreamInfo { + length, + last_generated_id, + first_entry_id, + last_entry_id, + } + } + + /// `XINFO GROUPS` with the lag field surfaced (Redis 7+). + pub async fn info_groups(&self) -> Vec { + let mut conn = self.conn.clone(); + let raw: Value = match redis::cmd("XINFO") + .arg("GROUPS") + .arg(&self.stream_key) + .query_async(&mut conn) + .await + { + Ok(v) => v, + Err(_) => return Vec::new(), + }; + // Each group is a flat alternating key/value array. Decode as + // a vec of HashMap so we can pull only the keys + // we care about (and tolerate version differences). + let rows: Vec> = + match redis::FromRedisValue::from_redis_value(&raw) { + Ok(v) => v, + Err(_) => return Vec::new(), + }; + rows.into_iter() + .map(|map| GroupInfo { + name: map + .get("name") + .and_then(|v| redis::FromRedisValue::from_redis_value(v).ok()) + .unwrap_or_default(), + consumers: map + .get("consumers") + .and_then(|v| redis::FromRedisValue::from_redis_value(v).ok()) + .unwrap_or(0), + pending: map + .get("pending") + .and_then(|v| redis::FromRedisValue::from_redis_value(v).ok()) + .unwrap_or(0), + last_delivered_id: map + .get("last-delivered-id") + .and_then(|v| redis::FromRedisValue::from_redis_value(v).ok()) + .unwrap_or_default(), + lag: map + .get("lag") + .and_then(|v| redis::FromRedisValue::from_redis_value(v).ok()), + }) + .collect() + } + + /// `XINFO CONSUMERS `. + pub async fn info_consumers(&self, group: &str) -> Vec { + let mut conn = self.conn.clone(); + let raw: Value = match redis::cmd("XINFO") + .arg("CONSUMERS") + .arg(&self.stream_key) + .arg(group) + .query_async(&mut conn) + .await + { + Ok(v) => v, + Err(_) => return Vec::new(), + }; + let rows: Vec> = + match redis::FromRedisValue::from_redis_value(&raw) { + Ok(v) => v, + Err(_) => return Vec::new(), + }; + rows.into_iter() + .map(|map| ConsumerInfo { + name: map + .get("name") + .and_then(|v| redis::FromRedisValue::from_redis_value(v).ok()) + .unwrap_or_default(), + pending: map + .get("pending") + .and_then(|v| redis::FromRedisValue::from_redis_value(v).ok()) + .unwrap_or(0), + idle_ms: map + .get("idle") + .and_then(|v| redis::FromRedisValue::from_redis_value(v).ok()) + .unwrap_or(0), + }) + .collect() + } + + /// Per-entry PEL view (`XPENDING - + `). + pub async fn pending_detail(&self, group: &str, count: usize) -> Vec { + let mut conn = self.conn.clone(); + let reply: RedisResult = redis::cmd("XPENDING") + .arg(&self.stream_key) + .arg(group) + .arg("-") + .arg("+") + .arg(count) + .query_async(&mut conn) + .await; + match reply { + Ok(r) => r + .ids + .into_iter() + .map(|row| PendingEntry { + id: row.id, + consumer: row.consumer, + idle_ms: row.last_delivered_ms as u64, + deliveries: row.times_delivered as u64, + }) + .collect(), + Err(_) => Vec::new(), + } + } + + // ------------------------------------------------------------------ + // Stats and demo housekeeping + // ------------------------------------------------------------------ + + pub fn stats(&self) -> Stats { + Stats { + produced_total: self.stats.produced_total.load(Ordering::Relaxed), + acked_total: self.stats.acked_total.load(Ordering::Relaxed), + claimed_total: self.stats.claimed_total.load(Ordering::Relaxed), + } + } + + pub fn reset_stats(&self) { + self.stats.produced_total.store(0, Ordering::Relaxed); + self.stats.acked_total.store(0, Ordering::Relaxed); + self.stats.claimed_total.store(0, Ordering::Relaxed); + } + + /// Drop the stream key entirely. Used by the demo's reset path. + pub async fn delete_stream(&self) { + let mut conn = self.conn.clone(); + let _: RedisResult = conn.del(&self.stream_key).await; + } +} + +#[derive(Debug, Clone, Copy, Default)] +pub struct Stats { + pub produced_total: u64, + pub acked_total: u64, + pub claimed_total: u64, +} + +// ---------------------------------------------------------------------- +// Helpers +// ---------------------------------------------------------------------- + +fn encode_fields(event_type: &str, payload: &HashMap) -> Vec<(String, String)> { + let mut out: Vec<(String, String)> = Vec::with_capacity(payload.len() + 2); + out.push(("type".to_string(), event_type.to_string())); + out.push(("ts_ms".to_string(), now_unix_ms_str())); + for (k, v) in payload { + out.push((k.clone(), v.clone())); + } + out +} + +fn now_unix_ms_str() -> String { + SystemTime::now() + .duration_since(UNIX_EPOCH) + .map(|d| d.as_millis().to_string()) + .unwrap_or_else(|_| "0".to_string()) +} + +fn is_busygroup(err: &RedisError) -> bool { + // Both detail() and to_string() include the BUSYGROUP token on a + // duplicate-create; check both for safety across redis-rs versions. + if let Some(detail) = err.detail() { + if detail.contains("BUSYGROUP") { + return true; + } + } + err.to_string().contains("BUSYGROUP") +} + +fn flatten_read_reply(reply: Option) -> Vec { + let mut out: Vec = Vec::new(); + let Some(reply) = reply else { + return out; + }; + for key in reply.keys { + for sid in key.ids { + out.push((sid.id, fields_from_stream_map(&sid.map))); + } + } + out +} + +fn stream_ids_to_entries(ids: Vec) -> Vec { + ids.into_iter() + .map(|sid| (sid.id, fields_from_stream_map(&sid.map))) + .collect() +} + +fn fields_from_stream_map(map: &HashMap) -> HashMap { + let mut out: HashMap = HashMap::with_capacity(map.len()); + for (k, v) in map { + let s: String = redis::FromRedisValue::from_redis_value(v).unwrap_or_default(); + out.insert(k.clone(), s); + } + out +} + +fn entry_id_from_value(v: Option<&Value>) -> Option { + // first-entry / last-entry come back as `[id, [field, value, ...]]` + // or as Nil if the stream is empty. We only need the id. + let Some(v) = v else { return None }; + if matches!(v, Value::Nil) { + return None; + } + if let Value::Bulk(items) = v { + if let Some(first) = items.first() { + if let Ok(id) = redis::FromRedisValue::from_redis_value(first) { + return Some(id); + } + } + } + None +} + +/// Decode a Redis 7+ `XAUTOCLAIM` reply: +/// `[ next-cursor, [ [id, [field, value, ...]], ... ], [deleted-id, ...] ]`. +fn parse_autoclaim_reply(raw: Value) -> RedisResult<(String, Vec, Vec)> { + let parts = match raw { + Value::Bulk(parts) => parts, + _ => { + return Err(RedisError::from(( + redis::ErrorKind::TypeError, + "XAUTOCLAIM: expected array reply", + ))) + } + }; + let mut iter = parts.into_iter(); + let cursor_v = iter + .next() + .ok_or_else(|| RedisError::from((redis::ErrorKind::TypeError, "XAUTOCLAIM: missing cursor")))?; + let entries_v = iter + .next() + .ok_or_else(|| RedisError::from((redis::ErrorKind::TypeError, "XAUTOCLAIM: missing entries")))?; + let deleted_v = iter.next(); // Redis 7.0+; absent on 6.2 (not supported) + + let next_cursor: String = redis::FromRedisValue::from_redis_value(&cursor_v)?; + + // Re-use the StreamRangeReply decoder for the entries vec: the + // wire shape is identical to `XRANGE` (an array of [id, [field, + // value, ...]] pairs). + let range: StreamRangeReply = redis::FromRedisValue::from_redis_value(&entries_v)?; + let entries = stream_ids_to_entries(range.ids); + + let deleted: Vec = match deleted_v { + Some(v) => redis::FromRedisValue::from_redis_value(&v).unwrap_or_default(), + None => Vec::new(), + }; + + Ok((next_cursor, entries, deleted)) +} From 94f0d7a49d4b0b1446e49717dce64ee44cb8c47e Mon Sep 17 00:00:00 2001 From: Andy Stark Date: Thu, 14 May 2026 16:16:45 +0100 Subject: [PATCH 4/4] DOC-6619 lessons learned -> agent skill updates --- .../assets/audit-checklist.md | 49 +++++++++++++++++++ 1 file changed, 49 insertions(+) diff --git a/.agents/skills/redis-use-case-ports/assets/audit-checklist.md b/.agents/skills/redis-use-case-ports/assets/audit-checklist.md index 69acc2c078..04fe6c0d1d 100644 --- a/.agents/skills/redis-use-case-ports/assets/audit-checklist.md +++ b/.agents/skills/redis-use-case-ports/assets/audit-checklist.md @@ -322,6 +322,55 @@ The recommended cross-client idiom is to **bypass the library wrapper** and send --- +## 22. Typed `XAUTOCLAIM` wrappers that silently drop the deleted-IDs slot + +**What to scan for:** any helper that calls the client library's typed `xautoclaim` / `XAutoClaim` / `StreamAutoClaim` wrapper. Look at the return-type binding: does it expose a third slot (deleted IDs / `deleted_messages` / `deletedIds`) alongside the next-cursor and claimed-messages? + +**Pass criterion:** the helper must surface the third slot of the Redis 7+ `XAUTOCLAIM` reply (the IDs whose stream payload was trimmed out before the claim ran). The reference helper's API is `(claimed, deleted_ids)` — and the caller is expected to log/route the deleted IDs to a dead-letter store. If the client library's typed wrapper hides the third slot (extremely common), the helper must drop to a raw-command path (`client.Do("XAUTOCLAIM", ...)`, `Jedis.sendCommand(XAUTOCLAIM, ...)`, `connection.dispatch(CommandType.XAUTOCLAIM, NestedMultiOutput, ...)`, `redis.call('XAUTOCLAIM', ...)`, `redis::cmd("XAUTOCLAIM").query_async(...)`) and parse the three-element reply by hand. **A wrapper that returns `(cursor, messages)` only — with no compile-time hint that a third slot exists — silently makes the dead-letter path invisible.** + +**Sample audit prompt:** + +> Audit every `XAUTOCLAIM` call site across the 9 client implementations under `content/develop/use-cases/{{USE_CASE_NAME}}/`. For each, identify whether the helper goes through the client library's typed wrapper or through a raw command. For the typed wrappers, verify against the library's documentation or source whether the wrapper surfaces all three reply elements (next-cursor, claimed-messages, deleted-IDs). Flag any helper that uses a typed wrapper whose return type omits the deleted-IDs slot — that helper has silently lost the dead-letter signalling path. Cross-check the helper's `_index.md` "Production usage" prose to confirm the deleted-IDs handling is documented for the reader. + +**Why on list:** Streaming use case, Phase 2 cross-port finding. Confirmed in **five** independent ports: + +- **go-redis v9.18.0** — `client.XAutoClaim(...)` and `XAutoClaimJustID(...)` both parse the reply and call `rd.DiscardNext()` on the third element. Workaround: `client.Do(ctx, "XAUTOCLAIM", ...)` with manual parsing. +- **Jedis 5.0.1 and 6.2.0** — `xautoclaim(...)` returns `Map.Entry>` (only 2 slots). Workaround: `Jedis.sendCommand(STREAM_AUTOCLAIM, ...)` with manual decode. +- **Lettuce 6.5.0** — `RedisCommands.xautoclaim(...)` returns `ClaimedMessages` exposing only the cursor and claimed messages. Workaround: `connection.dispatch(CommandType.XAUTOCLAIM, new NestedMultiOutput<>(...), args)`. +- **redis-rb 5.x** — typed `redis.xautoclaim` is decoded via the generic `HashifyStreamAutoclaim` proc, which drops the third element. Workaround: `redis.call('XAUTOCLAIM', ...)` with manual parsing. +- **redis-rs 0.24** — no typed `xautoclaim` wrapper exists at all, so the helper must use `redis::cmd("XAUTOCLAIM").arg(...).query_async()` directly. + +This is the most common class of finding in streaming-style ports. The reference's `(claimed, deleted_ids)` API surface assumed wrappers preserve all three reply elements; they don't. Every future port must verify whether its library's typed wrapper has caught up before relying on it. + +--- + +## 23. Handover-then-delete safety on consumer removal + +**What to scan for:** any helper / demo path that removes a consumer from a consumer group. Look for the sequence (a) handover the consumer's pending entries to a peer, then (b) `XGROUP DELCONSUMER`. The handover is typically a per-consumer `XPENDING ... CONSUMER` walk plus `XCLAIM` at `MIN-IDLE-TIME 0`. + +**Pass criterion:** the `XGROUP DELCONSUMER` call must run **only after the handover has provably succeeded**. Specifically: + +- Every error from the handover path (`XPENDING` failure, `XCLAIM` failure, partial-batch break, deadline timeout, etc.) must abort the removal. Do not log-and-continue. +- The handover must verify the source consumer's PEL is empty before deletion, OR the caller must surface the partial-handover failure so the user can retry. +- The registry-removal step (popping from the in-process workers map) must happen **after** the destructive `DELCONSUMER`, not before — otherwise a thrown exception between map-pop and DELCONSUMER leaves a half-removed worker. + +A naked `try { handover() } catch { ignore } finally { delete_consumer() }` is the **wrong shape**. `XGROUP DELCONSUMER` destroys the PEL of the deleted consumer — any entries the handover failed to move are unreachable by `XAUTOCLAIM` afterwards. The destruction is silent: no error, no log on the Redis side, no count of lost messages. + +**Sample audit prompt:** + +> Audit every consumer-removal path in the 9 client implementations under `content/develop/use-cases/{{USE_CASE_NAME}}/`. For each port's `remove_worker` (or equivalent) helper, trace the error-handling boundary between the `handover_pending` (or equivalent) call and the `XGROUP DELCONSUMER` call. Flag any port where: (a) handover errors are silently swallowed before delete fires; (b) the in-process registry entry is removed before delete fires (so a thrown exception between the two leaves a half-removed worker); (c) a partial-handover return value is accepted without verifying the source consumer's PEL is empty. Cross-check the demo's HTTP `/remove-worker` handler — if it returns 200 on a failed handover, the bug is user-visible. + +**Why on list:** Streaming use case, Phase 4b Codex independent review. Targeted Phase 4 audits cleared `remove_worker` paths in `rust`, `go`, `nodejs`, and `dotnet`; Codex's fresh-context review then found that all four shipped variants of the same pattern: + +- **rust** ([`demo_server.rs:154-160`](../../../content/develop/use-cases/streaming/rust/demo_server.rs)) — `handover_pending(...).await.unwrap_or(0)` swallows errors, then `delete_consumer` runs unconditionally. `event_stream.rs:367-376` discards `XCLAIM` failures as an empty claim list. +- **go** ([`demo_server.go:187-193`](../../../content/develop/use-cases/streaming/go/demo_server.go)) — `HandoverPending` correctly returns errors, but the caller logs them and continues to `DeleteConsumer`. +- **nodejs** ([`demoServer.js:635-649`](../../../content/develop/use-cases/streaming/nodejs/demoServer.js)) — `handoverPending` breaks and returns a partial count on `xPendingRange` or `xClaim` errors (`eventStream.js:365-399`). `removeWorker` then deletes regardless. +- **dotnet** ([`Program.cs:429-433`](../../../content/develop/use-cases/streaming/dotnet/Program.cs)) — `HandoverPending` catches `RedisServerException` and breaks early (`EventStream.cs:321-333`), returning whatever count it has. The caller stops the worker and deletes the consumer; if `StreamClaim` threw, the worker is already gone from `_workers` before `DELCONSUMER` runs. + +The reference (`redis-py/demo_server.py:590-598` + `event_stream.py:263-274`) aborts on handover errors before `delete_consumer` is reached, but the reference's `handover_pending` raises rather than returning partial counts — so the safe pattern is implicit and easy to miss when porting to languages where errors are returned values. + +--- + ## How to add a new row When a bug class is identified after this skill has been used: