diff --git a/blog/2026-06-09-pp-hicache-consistency.md b/blog/2026-06-09-pp-hicache-consistency.md
new file mode 100644
index 000000000..7e96dcdb7
--- /dev/null
+++ b/blog/2026-06-09-pp-hicache-consistency.md
@@ -0,0 +1,401 @@
+---
+title: "Host Tree Consistency for HiCache under Pipeline Parallelism: Problem and Fix"
+author: "Yanbo Yang, Zhangheng Huang, Shangming Cai, Chao Shi, Tingwei Huang, Zhiqiang Xie"
+date: "June 9, 2026"
+previewImg: /images/blog/pp_hicache_consistency/preview.png
+---
+
+## 1. Prologue: A Combination That "Looks Like It Shouldn't Break"
+
+In agentic and long-context inference scenarios, requests often **share very long prefixes**—system prompts, tool definitions, multi-turn conversation history—easily reaching tens of thousands of tokens. Recomputing this shared prefix for every request is absurdly expensive.
+
+> **Series context.** This article is a follow-up to the SGLang Pipeline Parallelism release blog, [*Pipeline Parallelism in SGLang*](https://www.lmsys.org/blog/2026-01-15-chunked-pipeline/), which introduced SGLang's highly optimized **Pipeline Parallelism (PP)** implementation—**Chunked Pipeline Parallelism (CPP)**, **Asynchronous P2P Communication**, and a simple yet effective **Dynamic Chunking** mechanism—compatible with other parallel strategies, **PD Disaggregation**, and **HiCache**. The key message we want to lead with: **for agentic serving, PP is critical.** Agentic workloads run very large models across many GPUs; PP is what lets those models both *fit* and *scale* with high throughput, which makes it a default building block rather than an optional optimization.
+>
+> That release covers the **initial / preliminary** stage of PP support in SGLang: it lands the core architecture and a "production-ready" path that is, in principle, compatible with PD Disaggregation and HiCache. But once PP is layered on top of **L3 persistent storage** under real production load, consistency corner cases emerge that the first implementation did not fully cover. This article zooms into one of them—**host radix tree divergence across PP ranks when HiCache L3 is enabled**—dissecting the root cause level by level and detailing the fix, so that the PP + HiCache combination is hardened from "works in principle" toward "robust in production."
+>
+> Companion material: interactive animation `hicache_pp_animation.html`, minimal repro script `dual_prefetch_groups_demo.py`; design and PR plan in upstream issue [sgl-project/sglang#22607](https://github.com/sgl-project/sglang/issues/22607).
+
+SGLang solves this with **HiCache**: a three-level KV cache hierarchy.
+
+<img src="/images/blog/pp_hicache_consistency/hicache_hierarchy.svg" alt="HiCache three-level KV cache hierarchy (L1 GPU / L2 host radix tree / L3 persistent store)" style="display:block;margin:0 auto;width:100%;max-width:820px;" />
+
+L3 persistence lets prefixes be reused not only across requests, but also after a process restart.
+
+Large models with dozens to hundreds of layers (e.g. DeepSeek-V3.2, GLM-5.1, DeepSeek-V4 Pro) require **Pipeline Parallelism (PP)** to split layers across multiple GPU groups, and are often combined with **disaggregated prefill**.
+
+**Why PP + L3 is a must-have configuration for today's agentic serving.** Agentic request shapes are highly distinctive: the shared prefix formed by system prompt + tool definitions + multi-turn history easily reaches tens of thousands of tokens, and is reused repeatedly across huge volumes of requests—and even after process restarts. To sustain this load in production, two things are indispensable:
+
+- **PP determines "fits and scales"**: as models grow ever larger (dozens of layers, hundreds of billions of parameters), only PP—splitting layers across multiple GPU groups—can both fit the weights and sustain high throughput via pipelining;
+- **L3 determines the "hit-rate ceiling"**: the shared prefix must be **persistently** cached. L3 (external distributed storage such as Mooncake) lifts prefix reuse beyond the limits of single-node host memory and a single process lifetime, raising cache hits from "within a session" to "global + across restarts", which directly drives **TTFT and per-token cost**.
+
+Therefore **PP + L3 is not an optional optimization but the default foundation for scaled agentic serving**. And it is precisely this most production-valuable combination that triggers the host tree consistency defect this article dissects and fixes—in other words, the closer you get to high-value production scenarios, the harder this bug is to avoid.
+
+But layering PP on top of L3 storage introduces a consistency defect absent in simpler configurations, manifesting as a **shape mismatch crash** rather than a numerical deviation. This article analyzes the cause level by level and explains the fix.
+
+## 2. The Consistency Invariant
+
+Under PP, each rank runs an independent scheduler, each maintaining its own radix tree. The core constraint of the system is:
+
+> The host radix tree on all ranks must remain **structurally identical**. If any rank's tree gains or loses even one node, the difference is amplified by subsequent operations and ultimately causes the cross-rank collective communication that depends on tree state to crash on a shape mismatch.
+
+## 3. Level-by-Level Analysis of the Divergence Cause
+
+### 3.1 L1-only + PP (no HiCache): no divergence
+
+Each rank receives the same requests in a consistent order via P2P. `match_prefix` operates on the device radix tree; its inserts and evicts are driven entirely by the same batch selection and complete synchronously within each scheduler cycle. Determinism holds, and there is no source of divergence.
+
+### 3.2 L1+L2 + PP (HiCache without storage): already crashes
+
+Adding the host cache introduces `write-through` (GPU→CPU backup) and `load-back` (CPU→GPU restore). Although `writing_check()` / `loading_check()` are called at deterministic points in the event loop, the underlying backup / load is **asynchronous IO**: completion events land in each rank's own queues (`ack_write_queue`, `ack_load_queue`), while prefetch completion is picked up by the main thread polling `check_prefetch_progress`. Within the same cycle, the scheduler threads on PP0 and PP1 may consume **different numbers** of completion events, thus applying a different number of updates to the host tree, causing `matched_host` to diverge and crash.
+
+Minimal repro from the fix PR [#27285](https://github.com/sgl-project/sglang/pull/27285):
+
+```bash
+sglang serve --model-path=Qwen/Qwen3-32B --pp-size=2 \
+  --enable-hierarchical-cache --max-total-tokens=$((256*1024))
+python -m sglang.bench_serving --num-prompts 1000
+# RuntimeError: shape '[3013, -1, 128]' is invalid for input of size 8192000
+```
+
+The L2-level fix is **`pp_sync`**: PP0's scheduler thread decides how many completion events each queue should consume this cycle, and PP1 consumes exactly the same number, eliminating divergence caused by the two scheduler threads finishing async work at different times. This mechanism is a **directional synchronization between scheduler threads**, a different category from the symmetric MIN used by L3 (see Sections 5–6). L2 and L3 each have an independent consistency defect requiring its own fix; this article focuses on L3, but the L2 problem is real and must not be skipped.
+
+### 3.3 L1+L2+L3 + PP (HiCache with storage): consistency breaks here
+
+L3 introduces a **prefetch thread**—an **asynchronous** background thread that independently queries external storage on each rank. Divergence is caused by four mutually reinforcing factors:
+
+1. **Async completion timing**: each rank's prefetch thread finishes at a different wall-clock moment. The one that finishes first immediately inserts a node into its host tree, while the laggard has not updated yet. The next `match_prefix` sees a different host tree state on different ranks, yielding different `host_hit_length` and `prefix_indices`.
+2. **Anchor divergence**: an L3 query uses a hash chain starting from some anchor node in the host tree. If one rank already inserted a node from the previous prefetch (`host_hit=896`), its anchor and token range differ from a rank that has not updated yet (`host_hit=0`); the two compute different hash chains for the same request and fetch different—or even wrong—data from storage.
+3. **Wall-clock LRU eviction divergence**: LRU uses `last_access_time = time.monotonic()`, which differs across ranks at the microsecond level, leading to different victim node choices, different GPU→CPU demotions, different host memory pressure, and hence a different `evictable_host_leaves` candidate set.
+4. **Amplifying cascade**: once the host tree diverges, subsequent eviction decisions, write-through timing, the next request's prefetch anchor, etc. all further amplify the difference, until a shape mismatch crash.
+
+The essence: L1/L2 operations are synchronous and deterministic within a cycle, whereas L3 prefetch is asynchronous and state-dependent. Async completion timing, state-dependent query parameters, and wall-clock-dependent eviction order together form a positive feedback loop that amplifies a tiny divergence into a crash.
+
+**The Lifecycle of an L3 Request and the Two Divergence Quantities**
+
+To set up the fix below, we first use the **main branch (before the L3 fix)** to explain how the prefetch path mutates the host tree, and which quantities diverge across ranks.
+
+The whole path spans two layers: the **background IO layer** (`HiCacheController`, `cache_controller.py`) and the **sole tree writer**, the scheduler main thread (`HiRadixCache`, `hiradix_cache.py`). On main, the two layers are coupled in two ways:
+
+- **Request dispatch and revocation go through queues**: `prefetch_queue` (main thread dispatches `PrefetchOperation`), `prefetch_buffer` (handed to actual IO after a hit), `prefetch_revoke_queue` (revoke when the hit is insufficient).
+- **Completion results go through a shared object + main-thread polling**: after the background IO thread loads pages into host memory, it **updates `completed_tokens` in place on the same `operation`**; the main thread, in its event loop, calls `check_prefetch_progress(req_id)` to poll each in-flight request's state, and only takes the result and writes the tree once the termination condition is met. **main has no `PrefetchAck`, no `prefetch_sync_queue`, and no background sync thread.**
+
+<img src="/images/blog/pp_hicache_consistency/l3_prefetch_problem.png" alt="main-branch prefetch flow: shared operation object + main-thread check_prefetch_progress polling" style="display:block;margin:0 auto;width:100%;max-width:820px;" />
+
+<p align="center">
+  <img src="/images/blog/pp_hicache_consistency/lifecycle.gif" alt="Two-Request Lifecycle (L3 miss/hit, host-tree consistency)" style="display:block;margin:0 auto;width:100%;max-width:960px;border:1px solid #30363d;border-radius:12px;background:#0e1117" />
+</p>
+
+> 🎬 **Interactive demo — Two-Request Lifecycle (L3 miss/hit, host-tree consistency).** If the embed above doesn't render (e.g. on plain GitHub), open [interactive version](/images/blog/pp_hicache_consistency/hicache_pp_animation_en_lifecycle.html) in a browser.
+
+Note two key facts: first, **both divergence quantities undergo one MIN reduction on main**—`storage_hit_count` in the background thread, `completed_tokens` in the main thread—but **both reductions cover only the TP/CP group and exclude PP**; second, **the tree write is triggered by the main thread polling each request**, not driven by background completion events. These two points are exactly where Section 4 pinpoints main's bugs.
+
+Accordingly, the three paths can be stated as:
+
+**miss path**: `match_prefix` misses in L2, `_storage_hit_query` misses in L3, falls back to GPU forward compute; the result is written into the L2 host tree via `insert`, then persisted to L3 by the backup thread via `write_backup` / `page_set`.
+
+**hit path**: `prefetch_thread` obtains the hit page count via `_storage_hit_query` (which internally calls `storage_backend.batch_exists`) and puts it into `prefetch_buffer`; `prefetch_io_aux_thread` pulls pages back to host batch by batch via `_page_transfer` (which internally calls `page_get`) and accumulates `completed_tokens` in place; the main thread, in `check_prefetch_progress`, retrieves the result via `terminate_prefetch`, takes the MIN of `completed_tokens`, and inserts the hit prefix into the host tree via `_insert_helper_host`.
+
+**eviction path**: under host memory pressure, the scheduler main thread deletes nodes from the host tree via `evict_host` (L3 still retains the corresponding pages).
+
+Along the whole path, two quantities naturally diverge across ranks, and both directly determine the host tree's insertion length:
+
+- **`storage_hit_count` (divergence #1)**: comes from the `batch_exists` query result. Each rank's host view and L3 visibility differ, so the return value can differ.
+- **`completed_tokens` (divergence #2)**: comes from the actual load result of `page_get`. Even with the same prefetch range, per-page loading may still partially fail to different degrees on different ranks.
+
+The host tree's growth (how many pages of prefix get inserted) is jointly determined by these two quantities. **If either quantity is not unified across ranks, the insertion length diverges and the host tree becomes inconsistent.**
+
+## 4. The Sync Logic on main and Its Two Bugs
+
+The main branch **does a MIN reduction on both divergence quantities**, but neither the reduction scope nor the trigger mechanism is sufficient to cover PP, so it still diverges. The logic is as follows.
+
+**Each divergence quantity has one MIN, but both cover only TP/CP.** `storage_hit_count` is reduced in the background `prefetch_thread_func`, `completed_tokens` is reduced in the main-thread `check_prefetch_progress`; both use the `attn` group (TP/CP) and **exclude `pp_group`**:
+
+```python
+# main: prefetch_thread_func (background thread) — unify prefetch range
+hash_value, storage_hit_count = self._storage_hit_query(operation)
+self._all_reduce_prefetch_groups(storage_hit_count_tensor, ReduceOp.MIN)   # prefetch_sync_groups, TP/CP only
+operation.hash_value   = hash_value[: storage_hit_count // self.page_size]
+operation.host_indices = operation.host_indices[:storage_hit_count]
+
+# main: check_prefetch_progress (main thread, per-request polling) — unify insertion length
+completed_tokens, hash_value = self.cache_controller.terminate_prefetch(operation)
+self._all_reduce_attn_groups(completed_tokens_tensor, ReduceOp.MIN)        # attn_cp/attn_tp only = TP/CP
+min_completed_tokens = completed_tokens_tensor.item()
+matched_length = self._insert_helper_host(..., hash_value[: min_completed_tokens // self.page_size])
+```
+
+And `_create_prefetch_sync_groups` creates only one set of `prefetch_sync_groups`, whose members likewise come from the `attn` group, exclude PP, and there is no second set:
+
+```python
+# main: _create_prefetch_sync_groups
+base_groups = [self.tp_group]            # or attn_cp_group / attn_tp_group; no pp_group, no second set
+```
+
+This produces two bugs:
+
+- **Bug 1: neither MIN covers PP.** The reductions happen only within TP/CP groups; between PP stages there is no alignment of `storage_hit_count` or `completed_tokens` whatsoever. The prefetch range and insertion length for the same request can differ across PP stages, so the host tree diverges across stages directly.
+- **Bug 2: tree writes are triggered by main-thread per-request polling, with no constraint across PP on "which and how many land this cycle".** main's insert happens in `check_prefetch_progress`, where the main thread polls and terminates each in-flight request independently. Because prefetch is asynchronous, the set and number of requests that each PP rank terminates and writes within the same cycle can differ (there is no `pp_sync` / qsize alignment mechanism); combined with the async completion timing, anchor divergence, and wall-clock LRU eviction from Section 3.3, once the host tree gets out of step it is amplified.
+
+```text
+main:  MIN(storage_hit) and MIN(completed_tokens) only within TP/CP, no sync on the PP dimension
+       tree writes triggered by main-thread per-request polling; the set/number finalized differs across PP
+       └──▶ across PP stages → insert/delete diverge → host tree inconsistent → crash
+```
+
+In sum, main's synchronization is "two MINs covering only TP/CP, tree writes via main-thread per-request polling with no alignment across PP", which cannot guarantee host radix tree consistency under PP + L3.
+
+## 5. Overview of the Fix
+
+The goal of the fix: make the **insertion/deletion on the host tree fully identical in both length and count** across all TP + PP ranks, without introducing deadlock and without blocking GPU compute. The solution consists of three classes of communication channels (matching the design in issue #22607), each governing one class of divergence source.
+
+<img src="/images/blog/pp_hicache_consistency/fix_three_channels.svg" alt="The fix: three classes of communication channels (PG1 / PG2 MIN + pp_sync) governing each divergence source" style="display:block;margin:0 auto;width:100%;max-width:820px;" />
+
+Design intent of the three channels:
+
+1. **PG1 and PG2 have identical members yet are built as two sets**: because the reductions of `storage_hit_count` and `completed_tokens` run on two different background threads. A single gloo communicator cannot be `all_reduce`d concurrently by two threads (it would misalign or even deadlock), so each thread owns its own set.
+2. **Background uses gloo (CPU), isolated from the scheduler thread's NCCL (GPU)**: the background `all_reduce` goes over CPU communication and does not occupy or block the CUDA collective stream where forward compute runs.
+3. **The two semantics differ**: PG1/PG2 are symmetric global MINs—pulling each rank's progress down to the "slowest"; channel 3 is single-point decision by PP0 + directed downstream broadcast—termination timing and drain count are unified by the leader.
+
+Channel 3 is exactly the L2-level `pp_sync` mechanism mentioned in Section 3.2, and it applies equally to consuming L2 and L3 completion events; PG1/PG2 are the core new additions of this L3 fix.
+
+<p align="center">
+  <img src="/images/blog/pp_hicache_consistency/consistency.gif" alt="Tree Consistency (MIN all-reduce keeps every PP/TP rank identical)" style="display:block;margin:0 auto;width:100%;max-width:960px;border:1px solid #30363d;border-radius:12px;background:#0e1117" />
+</p>
+
+> 🎬 **Interactive demo — Tree Consistency (auto-play).** How the MIN all-reduce keeps every PP/TP rank's radix tree identical. If the embed doesn't render, open [interactive version](/images/blog/pp_hicache_consistency/hicache_pp_animation_en_consistency.html) in a browser.
+
+### Design evolution: from a store-side MIN to an in-engine MIN thread
+
+
+It is worth recording how we arrived at this design, because the first version solved the same problem from a different layer. Initially we did **not** run the `storage_hit_count` MIN inside SGLang at all—we pushed it down into the **Mooncake store query layer**. When ranks from different PP stages issued their storage-hit queries, the store recognized them as one group (by a PP/TP group key) and returned the **group-wide MIN** hit length directly, so every rank received an already-unified prefetch range.
+
+That worked, but it coupled a correctness-critical invariant of the inference engine to the external storage backend: the store had to be aware of SGLang's parallel topology and group membership, the reduction semantics lived outside the engine, and any other storage backend would have to re-implement the same logic. It also could not handle the *second* divergence (`completed_tokens`) symmetrically, since that quantity only materializes during the actual page transfer **inside** the engine, not at query time.
+
+So we moved the MIN back into SGLang, onto a dedicated background thread (`prefetch_thread` doing `all_reduce(MIN)` over `prefetch_hits_sync_groups`). The engine now owns both reductions end-to-end (PG1 = `prefetch_hits_sync_groups` for the prefetch range, PG2 = `prefetch_completion_sync_groups` for the landed length), the storage backend stays a topology-agnostic key-value store, and the two divergence sources are unified by one uniform mechanism. The rest of this article describes that final, in-engine design.
+
+## 6. Implementation Walkthrough (current branch)
+
+<p align="center">
+  <img src="/images/blog/pp_hicache_consistency/threads.gif" alt="Thread Relationships & Tree Consistency" style="display:block;margin:0 auto;width:100%;max-width:960px;border:1px solid #30363d;border-radius:12px;background:#0e1117" />
+</p>
+
+> 🎬 **Interactive demo — Thread Relationships & Tree Consistency.** The roles of `prefetch_thread` / `prefetch_io_aux_thread` / `prefetch_sync_thread` and how their queues/MINs feed the sole tree writer. If the embed doesn't render, open [interactive version](/images/blog/pp_hicache_consistency/hicache_pp_animation_en_threads.html) in a browser.
+
+### 6.1 Bring pp_group into the sync, and build two independent sets
+
+`_create_sync_groups`, building on main, appends `pp_group`, and is called twice to build two independent sets—`prefetch_hits_sync_groups` (PG1) and `prefetch_completion_sync_groups` (PG2):
+
+```python
+# current branch: _create_sync_groups
+base_groups = [self.tp_group]            # or attn_cp_group / attn_tp_group
+if self.pp_group is not None:            # HACK: bring the PP ring into the sync
+    base_groups.append(self.pp_group)
+groups = []
+for group in base_groups:
+    ...
+    groups.append(create_custom_parallel_group(..., backend="gloo"))
+return groups
+
+# called twice -> two independent sets
+self.prefetch_hits_sync_groups = self._create_sync_groups()        # PG1
+self.prefetch_completion_sync_groups = self._create_sync_groups()  # PG2
+```
+
+The sync scope expands from "TP ring only" to "TP ring + PP ring", covering cross-PP divergence at the root.
+
+### 6.2 First MIN: unify the prefetch range (PG1)
+
+`prefetch_thread_func` does `all_reduce(MIN)` on `storage_hit_count` and truncates the prefetch set accordingly. This step also unifies the length of `hash_value`, i.e. the subsequent batch count and ack count:
+
+```python
+# current branch: prefetch_thread_func
+hash_value, storage_hit_count = self._storage_hit_query(operation)
+self._all_reduce(storage_hit_count_tensor, ReduceOp.MIN, self.prefetch_hits_sync_groups)   # @ PG1
+storage_hit_count = storage_hit_count_tensor.item()
+operation.hash_value   = hash_value[: storage_hit_count // self.page_size]
+operation.host_indices = operation.host_indices[:storage_hit_count]
+```
+
+<p align="center">
+  <img src="/images/blog/pp_hicache_consistency/skew.gif" alt="Async Skew × MIN Lockstep" style="display:block;margin:0 auto;width:100%;max-width:960px;border:1px solid #30363d;border-radius:12px;background:#0e1117" />
+</p>
+
+> 🎬 **Interactive demo — Async Skew × MIN Lockstep.** Background prefetch threads finish at different wall-clock times; the `all_reduce(MIN)` pulls every rank back into lockstep on the same prefetch range. If the embed doesn't render, open [interactive version](/images/blog/pp_hicache_consistency/hicache_pp_animation_en_skew.html) in a browser.
+
+### 6.3 Exactly one PrefetchAck per batch
+
+`_page_transfer` is changed so that on error it **no longer breaks**, but keeps looping and emits exactly one `PrefetchAck` per batch, guaranteeing each rank produces the **same number** of acks:
+
+```python
+# current branch: _page_transfer
+for i in range(0, len(operation.hash_value), self.storage_batch_size):
+    if ok and operation.is_asked_to_terminate():
+        ok = False
+    if ok:
+        n = self.page_get_func(operation, batch_hashes, batch_host_indices, extra_info)
+        if n != len(batch_hashes):
+            ok = False
+        completed_tokens += n * self.page_size
+    ack = PrefetchAck(rid=..., completed_tokens=completed_tokens, ...)
+    self.prefetch_sync_queue.put(ack)    # exactly one ack per batch, even on error
+```
+
+<p align="center">
+  <img src="/images/blog/pp_hicache_consistency/ackalign.gif" alt="PrefetchAck Alignment & Anti-Hang" style="display:block;margin:0 auto;width:100%;max-width:960px;border:1px solid #30363d;border-radius:12px;background:#0e1117" />
+</p>
+
+> 🎬 **Interactive demo — PrefetchAck Alignment & Anti-Hang.** Why emitting exactly one ack per batch (even on error) keeps the collective call count equal per rank and prevents a permanent hang. If the embed doesn't render, open [interactive version](/images/blog/pp_hicache_consistency/hicache_pp_animation_en_ackalign.html) in a browser.
+
+### 6.4 Second MIN: unify the insertion length (PG2)
+
+`prefetch_sync_thread_func` does `all_reduce(MIN)` on each ack's `completed_tokens` and writes it back into the ack. Each ack corresponds to one reduction:
+
+```python
+# current branch: prefetch_sync_thread_func
+ack = self.prefetch_sync_queue.get(...)
+self._all_reduce(completed_tokens_tensor, ReduceOp.MIN, self.prefetch_completion_sync_groups)   # @ PG2
+ack.completed_tokens = completed_tokens_tensor.item()
+self.ack_prefetch_queue.put(ack)
+```
+
+### 6.5 Main thread writes the tree using the unified values
+
+The scheduler main thread `drain_storage_control_queues` first takes the MIN of each queue's qsize via channel 3 (`_all_reduce` + `_pp_sync` directed broadcast), so each rank consumes the **same number** of acks; then `_handle_prefetch_result` decides the insertion length using the post-MIN `completed_tokens`:
+
+```python
+# current branch: _handle_prefetch_result
+completed_tokens = operation.completed_tokens         # unified via PG2 MIN, identical per rank
+fetched_key      = prefetch_key[:completed_tokens]
+written_indices  = host_indices[:completed_tokens]
+matched_length   = self._insert_helper_host(
+    last_host_node, fetched_key, written_indices,
+    hash_value[: completed_tokens // self.page_size],
+)
+```
+
+## 7. Why Two gloo Groups Won't Deadlock: A Concurrency-Safety Demo
+
+The fix must dodge two independent deadlock sources at once: one is "each rank must make an equal number of collective calls" (guaranteed by Section 6.3's "exactly one ack per batch"); the other is the topic of this section—**two background threads using collectives concurrently**. Both must hold simultaneously; missing either still hangs.
+
+### 7.1 The core risk: one communicator used by two threads concurrently
+
+gloo's `all_reduce` is a stateful rendezvous: a single ProcessGroup (communicator) maintains a set of message sequence numbers and matching state underneath, and is **not guaranteed thread-safe under concurrency**. If two background threads (`prefetch_thread` reducing `storage_hit_count`, `prefetch_sync_thread` reducing `completed_tokens`) launch `all_reduce` **concurrently** on the **same** communicator, the relative order in which the two threads enter the collective on each rank is uncontrolled:
+
+```text
+sharing one group group1 (dangerous):
+  rank0:  threadA.all_reduce(group1)  arrives first   threadB.all_reduce(group1)  arrives later
+  rank1:  threadB.all_reduce(group1)  arrives first   threadA.all_reduce(group1)  arrives later
+          └──────────────┬──────────────┘
+          rank0's A and rank1's B get mismatched into the same rendezvous
+          → reduce values that shouldn't be reduced together / tag mismatch → data corruption or permanent block
+```
+
+### 7.2 The fix: each thread owns its own set
+
+`_create_sync_groups`, called twice, creates two independent gloo communicators over the **same set of ranks**—`prefetch_hits_sync_groups` (PG1) and `prefetch_completion_sync_groups` (PG2)—each dedicated to one background thread. The two collective streams travel over their own communicators, never interleave, and the rendezvous always pairs one-to-one within the "same group + same thread" semantics:
+
+```python
+# PG1 dedicated to prefetch_thread (storage_hit_count), PG2 to prefetch_sync_thread (completed_tokens)
+self.prefetch_hits_sync_groups = self._create_sync_groups()        # PG1
+self.prefetch_completion_sync_groups = self._create_sync_groups()  # PG2
+```
+
+<p align="center">
+  <img src="/images/blog/pp_hicache_consistency/deadlock.gif" alt="Why 2 Groups Avoid Deadlock" style="display:block;margin:0 auto;width:100%;max-width:960px;border:1px solid #30363d;border-radius:12px;background:#0e1117" />
+</p>
+
+> 🎬 **Interactive demo — Why 2 Groups Avoid Deadlock.** Two background threads each own a separate gloo group, so their concurrent `all_reduce`s never interleave into the same rendezvous. If the embed doesn't render, open [interactive version](/images/blog/pp_hicache_consistency/hicache_pp_animation_en_deadlock.html) in a browser.
+
+### 7.3 Minimal runnable example
+
+Below is a self-spawning, CPU-only minimal skeleton compressing the above structure into ~30 lines. `dual` uses two sets (the real design, completes cleanly); `shared` makes the two threads share one set (reproducing the interleave/block):
+
+```python
+import os, threading, time
+import torch
+import torch.distributed as dist
+import torch.multiprocessing as mp
+
+def worker(rank, world, mode, rounds, port):
+    os.environ["MASTER_ADDR"], os.environ["MASTER_PORT"] = "127.0.0.1", str(port)
+    dist.init_process_group("gloo", rank=rank, world_size=world)
+    ranks = list(range(world))
+    # call new_group twice over the same ranks -> two independent communicators
+    g1 = dist.new_group(ranks=ranks, backend="gloo")   # ~ prefetch_hits_sync_groups       (PG1)
+    g2 = dist.new_group(ranks=ranks, backend="gloo")   # ~ prefetch_completion_sync_groups (PG2)
+
+    def reduce_loop(group, base, n):
+        for _ in range(n):
+            t = torch.tensor([base + rank], dtype=torch.int32)
+            dist.all_reduce(t, op=dist.ReduceOp.MIN, group=group)  # MIN -> base
+            time.sleep(0.05)                                       # widen the window, amplify concurrent interleave
+
+    # dual: threadA->g1, threadB->g2 (safe);  shared: both threads share g1 (dangerous)
+    gA, gB = (g1, g2) if mode == "dual" else (g1, g1)
+    tA = threading.Thread(target=reduce_loop, args=(gA, 0, rounds))    # ~ storage_hit_count
+    tB = threading.Thread(target=reduce_loop, args=(gB, 100, rounds))  # ~ completed_tokens
+    tA.start(); tB.start(); tA.join(); tB.join()
+    print(f"[rank {rank}] done ({mode})", flush=True)
+    dist.destroy_process_group()
+
+if __name__ == "__main__":
+    mp.set_start_method("spawn", force=True)
+    mp.spawn(worker, args=(4, "dual", 5, 29560), nprocs=4, join=True)
+```
+
+Switching `"dual"` to `"shared"` lets you observe the interleave/block when two threads share one group. The repo's `dual_prefetch_groups_demo.py` is its full version, with a watchdog `join(timeout)` to explicitly detect hangs, plus an `uneven` mode (corresponding to the "unequal call count" failure).
+
+### 7.4 How this technique solves the PP + L3 problem
+
+Putting this technique back into context: under PP + L3, the host tree diverges because the two divergence quantities `storage_hit_count` and `completed_tokens` are not unified across **TP + PP** (Section 4). To unify them requires **two cross-rank MINs**; these two MINs naturally occur at two different stages of the prefetch pipeline—the query stage and the load stage—and must run on **background threads** so as not to block the GPU forward on the scheduler main thread. Hence:
+
+- **two divergence quantities → two background MINs → two concurrent background threads**, an inevitable result of aligning the PP/TP dimension;
+- **two concurrent threads running collectives at once → two communicators are mandatory**, otherwise per 7.1 they inevitably interleave/deadlock;
+- both sets use **gloo (CPU)**, isolated from the main thread's NCCL (GPU), so background alignment does not occupy the forward's CUDA collective stream;
+- combined with Section 6.3's "exactly one ack per batch + qsize alignment", PG2's reduction count is equal per rank, so neither set hangs.
+
+In the end: PG1 makes all TP+PP ranks prefetch the same range, PG2 makes them land the same length, each rank inserts a **prefix of equal length** into the host tree, the host tree stays consistent across PP stages, and the shape mismatch crash of Section 4 is eliminated at the root. In other words, **"two groups" is not concurrency for concurrency's sake, but a concurrency-safe design forced out by the correctness requirement that "both divergence quantities must cover PP".**
+
+### 7.5 The demo has no pp_sync: why it still won't deadlock, and which part of the demo pp_sync corresponds to
+
+To be explicit: **the demo above only reproduces PG1/PG2, the two background symmetric all_reduces, and deliberately excludes pp_sync (channel 3).** It still won't deadlock because whether a collective can deadlock depends on only two things, both of which are already in the demo and have nothing to do with pp_sync:
+
+1. **Two independent sets (`g1 != g2`)**: the two concurrent threads each own a communicator and never interleave. This corresponds to `dual` vs `shared`—only `shared`, sharing one group, interleaves/blocks.
+2. **Each rank makes an equal number `n` of `all_reduce` calls**: the rendezvous pairs one-to-one. This corresponds to `dual` vs `uneven`—`uneven` makes one rank call `all_reduce` once fewer, and the rest wait forever for a pairing and hang.
+
+In the demo, `n` is a constant passed in directly; in the real system, `n` (= PG2's reduction count = batch count = ack count) is guaranteed equal per rank by **PG1 unifying the `hash_value` length + Section 6.3's "exactly one ack per batch"**. **Key: it's these two invariants that make `n` equal, not pp_sync.**
+
+So which part of the demo does pp_sync correspond to? **The answer: it corresponds to none of the demo's `all_reduce` calls, but to the "downstream stage" the demo deliberately omits.** The demo ends after `reduce_loop`; the real system, after the two background MINs, still has the scheduler main thread's step of "consume acks, write the host tree" (Section 6.5, `drain_storage_control_queues` → `_handle_prefetch_result`). pp_sync (channel 3) acts exactly at this step: PP0 decides how many acks to consume this cycle and when to terminate, then broadcasts unidirectionally along the PP ring, so each rank consumes the **same number** of completion events and keeps the tree-write action sequence consistent.
+
+```text
+demo covers:    [two background threads × two sets × n all_reduce each]   <- PG1 / PG2
+                          │
+demo omits:     ──────────▼──────────▶  [main thread: consume acks + write host tree]   <- pp_sync is here
+            PG1/PG2 guarantee "compute a unified length and don't deadlock"   pp_sync guarantees "equal consume count, consistent write sequence"
+```
+
+**Separation of duties: shape is governed by the two MINs, count/sequencing by pp_sync—these are two separate things, governed by different mechanisms in this fix; don't conflate them.**
+
+- **PG1 / PG2 → shape (each request's "how long to insert" is identical)**: PG1 does MIN on `storage_hit_count`, unifying the prefetch range → same `hash_value` length, same batch count; PG2 does MIN on `completed_tokens`, unifying the landed length → equal-length inserted prefixes. By this point, the "how long to insert" computed on each PP rank is already equal per rank—**shape consistency is entirely the work of PG1 + PG2, unrelated to pp_sync**.
+- **pp_sync (channel 3) → count / sequencing (this cycle's "how many requests to process, in what order, when to stop" is identical)**: in `drain_storage_control_queues`, the main thread takes some completion events from the queues each cycle to write the tree, but the number of acks (qsize) piled up in each rank's queue can differ—this is **not a length divergence, but a divergence in "how many requests each intends to process this cycle"**. pp_sync makes PP0 decide the consume count for this cycle and broadcast it along the PP ring, so each rank consumes the same number of completion events this cycle and keeps the tree-write action sequence aligned.
+
+| mechanism | guarantees | dimension |
+| --- | --- | --- |
+| PG1 + PG2 | each request's insertion length is identical | shape / length |
+| pp_sync (channel 3) | this cycle: how many to process, in what order, when to stop | count / sequencing |
+
+Finally, pp_sync itself won't introduce deadlock: it is not a symmetric global all_reduce, but a **unidirectional P2P relay** on the PP ring (`recv` from upstream → non-blocking `isend` downstream), needing no global rendezvous, so it neither competes with those two all_reduces for a communicator nor stalls on "who arrives first" (see Section 5 channel 3, Section 6.5).
+
+## 8. Summary
+
+The host tree consistency problem of PP + HiCache (L3) is rooted in the fact that the host tree's "growth" is jointly determined by two per-rank-divergent quantities (`storage_hit_count`, `completed_tokens`), while the main branch unified only one of them and did not cover PP. The core of the fix can be summarized in three points:
+
+- **Two independent divergence sources -> two symmetric MINs**: PG1 unifies the prefetch range, PG2 unifies the landed length, both covering TP + PP.
+- **Two concurrent background threads -> two independent gloo sets**: avoiding the concurrent-reduction misalignment and deadlock caused by sharing a communicator, and isolating from the NCCL on GPU.
+- **Exactly one PrefetchAck per batch + channel 3 qsize alignment**: guaranteeing the collective call count and the completion-event consume count are equal per rank, both preventing hangs and keeping the tree-write action sequence consistent.
+
+Augmented by the radix tree state self-check guardrail, the host radix tree on each PP rank stays byte-for-byte identical in inserts and deletes, severing at the root the divergence feedback loop described in Section 3.3.
+
+## Acknowledgement
+
+We would like to thank the SGLang team and community for the implementation and generous support, especially **Zhangheng Huang**, **Shangming Cai**, **Chao Shi**, **Tingwei Huang**, **Yanbo Yang**, **Zhiqiang Xie**, and **Lianmin Zheng**, and many others. This work builds directly on the SGLang Pipeline Parallelism design and the HiCache three-level KV cache hierarchy, from which it inherits its architecture and to which it contributes the PP host-tree consistency fix.
diff --git a/public/images/blog/pp_hicache_consistency/ackalign.gif b/public/images/blog/pp_hicache_consistency/ackalign.gif
new file mode 100644
index 000000000..36218216a
Binary files /dev/null and b/public/images/blog/pp_hicache_consistency/ackalign.gif differ
diff --git a/public/images/blog/pp_hicache_consistency/consistency.gif b/public/images/blog/pp_hicache_consistency/consistency.gif
new file mode 100644
index 000000000..e330cdd4e
Binary files /dev/null and b/public/images/blog/pp_hicache_consistency/consistency.gif differ
diff --git a/public/images/blog/pp_hicache_consistency/deadlock.gif b/public/images/blog/pp_hicache_consistency/deadlock.gif
new file mode 100644
index 000000000..dc52cd2c9
Binary files /dev/null and b/public/images/blog/pp_hicache_consistency/deadlock.gif differ
diff --git a/public/images/blog/pp_hicache_consistency/fix_three_channels.svg b/public/images/blog/pp_hicache_consistency/fix_three_channels.svg
new file mode 100644
index 000000000..993e79ffd
--- /dev/null
+++ b/public/images/blog/pp_hicache_consistency/fix_three_channels.svg
@@ -0,0 +1,77 @@
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 860 380" font-family="-apple-system,Segoe UI,Helvetica,Arial,sans-serif">
+  <defs>
+    <marker id="arr" markerWidth="10" markerHeight="10" refX="7" refY="3" orient="auto">
+      <path d="M0,0 L7,3 L0,6 Z" fill="#6b7280"/>
+    </marker>
+  </defs>
+  <rect x="0" y="0" width="860" height="380" fill="#ffffff"/>
+
+  <!-- header -->
+  <text x="430" y="28" text-anchor="middle" font-size="16" font-weight="700" fill="#111827">Three communication channels — one per divergence source</text>
+  <text x="430" y="48" text-anchor="middle" font-size="12" fill="#6b7280">background reductions run on gloo / CPU, isolated from the scheduler's NCCL / GPU stream</text>
+
+  <!-- column headers -->
+  <text x="150" y="78" text-anchor="middle" font-size="11" font-weight="600" fill="#9ca3af">CHANNEL (owner thread · backend)</text>
+  <text x="455" y="78" text-anchor="middle" font-size="11" font-weight="600" fill="#9ca3af">OPERATION</text>
+  <text x="745" y="78" text-anchor="middle" font-size="11" font-weight="600" fill="#9ca3af">UNIFIES / GOVERNS</text>
+
+  <!-- ===== Row 1: PG1 ===== -->
+  <g>
+    <rect x="20" y="92" width="260" height="78" rx="10" fill="#f5f3ff" stroke="#7c3aed" stroke-width="1.5"/>
+    <text x="34" y="116" font-size="13" font-weight="700" fill="#5b21b6">PG1 · prefetch_hits_sync_groups</text>
+    <text x="34" y="135" font-size="11.5" fill="#4b5563">owner: prefetch_thread</text>
+    <text x="34" y="152" font-size="11" fill="#7c3aed">gloo · CPU · TP ring + PP ring</text>
+
+    <line x1="284" y1="131" x2="324" y2="131" stroke="#6b7280" stroke-width="1.5" marker-end="url(#arr)"/>
+
+    <rect x="328" y="100" width="256" height="62" rx="9" fill="#fafafa" stroke="#d1d5db"/>
+    <text x="456" y="126" text-anchor="middle" font-size="12.5" font-weight="600" fill="#111827">all_reduce(MIN)</text>
+    <text x="456" y="146" text-anchor="middle" font-size="12" fill="#374151">storage_hit_count</text>
+
+    <line x1="588" y1="131" x2="628" y2="131" stroke="#6b7280" stroke-width="1.5" marker-end="url(#arr)"/>
+
+    <rect x="632" y="100" width="208" height="62" rx="9" fill="#f5f3ff" stroke="#7c3aed"/>
+    <text x="736" y="124" text-anchor="middle" font-size="12" font-weight="600" fill="#5b21b6">prefetch range</text>
+    <text x="736" y="143" text-anchor="middle" font-size="10.5" fill="#6b7280">hash_value length · batch/ack count</text>
+  </g>
+
+  <!-- ===== Row 2: PG2 ===== -->
+  <g>
+    <rect x="20" y="186" width="260" height="78" rx="10" fill="#ecfeff" stroke="#0891b2" stroke-width="1.5"/>
+    <text x="34" y="210" font-size="13" font-weight="700" fill="#0e7490">PG2 · prefetch_completion_sync_groups</text>
+    <text x="34" y="229" font-size="11.5" fill="#4b5563">owner: prefetch_sync_thread</text>
+    <text x="34" y="246" font-size="11" fill="#0891b2">gloo · CPU · TP ring + PP ring</text>
+
+    <line x1="284" y1="225" x2="324" y2="225" stroke="#6b7280" stroke-width="1.5" marker-end="url(#arr)"/>
+
+    <rect x="328" y="194" width="256" height="62" rx="9" fill="#fafafa" stroke="#d1d5db"/>
+    <text x="456" y="220" text-anchor="middle" font-size="12.5" font-weight="600" fill="#111827">all_reduce(MIN)</text>
+    <text x="456" y="240" text-anchor="middle" font-size="12" fill="#374151">completed_tokens</text>
+
+    <line x1="588" y1="225" x2="628" y2="225" stroke="#6b7280" stroke-width="1.5" marker-end="url(#arr)"/>
+
+    <rect x="632" y="194" width="208" height="62" rx="9" fill="#ecfeff" stroke="#0891b2"/>
+    <text x="736" y="218" text-anchor="middle" font-size="12" font-weight="600" fill="#0e7490">landed length</text>
+    <text x="736" y="237" text-anchor="middle" font-size="10.5" fill="#6b7280">prefix actually inserted into tree</text>
+  </g>
+
+  <!-- ===== Row 3: pp_sync ===== -->
+  <g>
+    <rect x="20" y="280" width="260" height="78" rx="10" fill="#fffbeb" stroke="#d97706" stroke-width="1.5"/>
+    <text x="34" y="304" font-size="13" font-weight="700" fill="#b45309">Channel 3 · pp_sync</text>
+    <text x="34" y="323" font-size="11.5" fill="#4b5563">owner: main thread (PP ring)</text>
+    <text x="34" y="340" font-size="11" fill="#d97706">PP0 decision → downstream broadcast</text>
+
+    <line x1="284" y1="319" x2="324" y2="319" stroke="#6b7280" stroke-width="1.5" marker-end="url(#arr)"/>
+
+    <rect x="328" y="288" width="256" height="62" rx="9" fill="#fafafa" stroke="#d1d5db"/>
+    <text x="456" y="314" text-anchor="middle" font-size="12.5" font-weight="600" fill="#111827">leader decide + relay</text>
+    <text x="456" y="334" text-anchor="middle" font-size="12" fill="#374151">drain count · stop timing</text>
+
+    <line x1="588" y1="319" x2="628" y2="319" stroke="#6b7280" stroke-width="1.5" marker-end="url(#arr)"/>
+
+    <rect x="632" y="288" width="208" height="62" rx="9" fill="#fffbeb" stroke="#d97706"/>
+    <text x="736" y="312" text-anchor="middle" font-size="12" font-weight="600" fill="#b45309">consume count / order</text>
+    <text x="736" y="331" text-anchor="middle" font-size="10.5" fill="#6b7280">how many to process, when to stop</text>
+  </g>
+</svg>
diff --git a/public/images/blog/pp_hicache_consistency/hicache_hierarchy.svg b/public/images/blog/pp_hicache_consistency/hicache_hierarchy.svg
new file mode 100644
index 000000000..5592a16f9
--- /dev/null
+++ b/public/images/blog/pp_hicache_consistency/hicache_hierarchy.svg
@@ -0,0 +1,41 @@
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 860 240" font-family="-apple-system,Segoe UI,Helvetica,Arial,sans-serif">
+  <defs>
+    <marker id="ar" markerWidth="10" markerHeight="10" refX="7" refY="3" orient="auto">
+      <path d="M0,0 L7,3 L0,6 Z" fill="#6b7280"/>
+    </marker>
+  </defs>
+  <rect x="0" y="0" width="860" height="240" fill="#ffffff"/>
+  <text x="430" y="28" text-anchor="middle" font-size="15" font-weight="700" fill="#111827">HiCache — a three-level KV cache hierarchy</text>
+
+  <!-- L1 -->
+  <rect x="30" y="70" width="220" height="100" rx="12" fill="#eff6ff" stroke="#2563eb" stroke-width="1.5"/>
+  <text x="140" y="100" text-anchor="middle" font-size="14" font-weight="700" fill="#1d4ed8">L1 · GPU memory</text>
+  <text x="140" y="122" text-anchor="middle" font-size="11.5" fill="#4b5563">device KV cache</text>
+  <text x="140" y="140" text-anchor="middle" font-size="11.5" fill="#4b5563">(fastest, smallest)</text>
+
+  <!-- L2 -->
+  <rect x="320" y="70" width="220" height="100" rx="12" fill="#ecfdf5" stroke="#059669" stroke-width="1.5"/>
+  <text x="430" y="100" text-anchor="middle" font-size="14" font-weight="700" fill="#047857">L2 · CPU memory</text>
+  <text x="430" y="122" text-anchor="middle" font-size="11.5" fill="#4b5563">host radix tree</text>
+  <text x="430" y="140" text-anchor="middle" font-size="11.5" fill="#4b5563">(per-rank, must stay identical)</text>
+
+  <!-- L3 -->
+  <rect x="610" y="70" width="220" height="100" rx="12" fill="#fff7ed" stroke="#ea580c" stroke-width="1.5"/>
+  <text x="720" y="100" text-anchor="middle" font-size="14" font-weight="700" fill="#c2410c">L3 · external store</text>
+  <text x="720" y="122" text-anchor="middle" font-size="11.5" fill="#4b5563">e.g. Mooncake</text>
+  <text x="720" y="140" text-anchor="middle" font-size="11.5" fill="#4b5563">persistent across restarts</text>
+
+  <!-- arrows L1<->L2 -->
+  <line x1="252" y1="108" x2="318" y2="108" stroke="#6b7280" stroke-width="1.5" marker-end="url(#ar)"/>
+  <line x1="318" y1="132" x2="252" y2="132" stroke="#6b7280" stroke-width="1.5" marker-end="url(#ar)"/>
+  <text x="285" y="100" text-anchor="middle" font-size="10" fill="#6b7280">evict</text>
+  <text x="285" y="152" text-anchor="middle" font-size="10" fill="#6b7280">load-back</text>
+
+  <!-- arrows L2<->L3 -->
+  <line x1="542" y1="108" x2="608" y2="108" stroke="#6b7280" stroke-width="1.5" marker-end="url(#ar)"/>
+  <line x1="608" y1="132" x2="542" y2="132" stroke="#6b7280" stroke-width="1.5" marker-end="url(#ar)"/>
+  <text x="575" y="100" text-anchor="middle" font-size="10" fill="#6b7280">backup</text>
+  <text x="575" y="152" text-anchor="middle" font-size="10" fill="#6b7280">prefetch</text>
+
+  <text x="430" y="206" text-anchor="middle" font-size="11.5" fill="#6b7280">L3 persistence lets shared prefixes be reused across requests <tspan font-style="italic">and</tspan> across process restarts.</text>
+</svg>
diff --git a/public/images/blog/pp_hicache_consistency/hicache_pp_animation_en_ackalign.html b/public/images/blog/pp_hicache_consistency/hicache_pp_animation_en_ackalign.html
new file mode 100644
index 000000000..2b4baa90e
--- /dev/null
+++ b/public/images/blog/pp_hicache_consistency/hicache_pp_animation_en_ackalign.html
@@ -0,0 +1,1559 @@
+<!DOCTYPE html>
+<html lang="zh-CN">
+<head>
+<meta charset="UTF-8" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<title>HiCache × PP=3 · TP=8：树一致性 & 防死锁 动画</title>
+<style>
+  :root{
+    --bg:#0e1117; --panel:#161b22; --panel2:#1c2330; --line:#30363d;
+    --text:#e6edf3; --muted:#8b949e;
+    --blue:#58a6ff; --green:#3fb950; --red:#f85149; --amber:#d29922;
+    --purple:#bc8cff; --cyan:#56d4dd;
+  }
+  *{box-sizing:border-box;}
+  body{
+    margin:0; background:radial-gradient(1200px 600px at 50% -10%, #18202c, var(--bg));
+    color:var(--text); font-family:-apple-system,BlinkMacSystemFont,"Segoe UI","PingFang SC","Microsoft YaHei",sans-serif;
+    line-height:1.5;
+  }
+  #langBtn{ position:fixed; top:14px; right:16px; z-index:50; background:var(--panel2); border:1px solid var(--blue);
+    color:var(--blue); padding:7px 14px; border-radius:999px; cursor:pointer; font-size:13px; font-weight:600; }
+  #langBtn:hover{ background:var(--blue); color:#04101f; }
+  header{ text-align:center; padding:20px 16px 4px; }
+  header h1{ margin:0 0 4px; font-size:21px; }
+  header p{ margin:0; color:var(--muted); font-size:13px; }
+  .tabs{ display:flex; gap:8px; justify-content:center; margin:16px auto 8px; flex-wrap:wrap; }
+  .tab{ background:var(--panel); border:1px solid var(--line); color:var(--text);
+    padding:9px 16px; border-radius:999px; cursor:pointer; font-size:14px; transition:all .15s; }
+  .tab.active{ background:var(--blue); color:#04101f; border-color:var(--blue); font-weight:600; }
+  .wrap{ max-width:1120px; margin:0 auto; padding:0 16px 60px; }
+  .scene{ background:var(--panel); border:1px solid var(--line); border-radius:14px; padding:18px; position:relative; }
+  .hidden{ display:none; }
+  .controls{ display:flex; gap:10px; justify-content:center; align-items:center; margin:14px 0 4px; flex-wrap:wrap; }
+  button.ctl{ background:var(--panel2); border:1px solid var(--line); color:var(--text);
+    padding:8px 16px; border-radius:8px; cursor:pointer; font-size:14px; }
+  button.ctl:hover{ border-color:var(--blue); }
+  button.ctl.primary{ background:var(--green); color:#04140a; border-color:var(--green); font-weight:600; }
+  button.ctl.alt{ background:var(--red); color:#1a0606; border-color:var(--red); font-weight:600; }
+  .caption{ text-align:center; min-height:44px; margin:10px auto 0; max-width:900px; font-size:15px; }
+  .caption .k{ color:var(--cyan); font-weight:600; }
+  code{ background:#0d1117; padding:1px 6px; border-radius:4px; border:1px solid var(--line); color:var(--cyan); font-size:12px; }
+  .legend{ display:flex; gap:18px; justify-content:center; font-size:12px; color:var(--muted); flex-wrap:wrap; margin-bottom:10px; }
+  .chip{ display:inline-flex; align-items:center; gap:6px; }
+  .sw{ width:14px; height:14px; border-radius:4px; display:inline-block; }
+
+  /* ---------- mesh ---------- */
+  .mesh-head{ display:flex; align-items:center; gap:8px; margin-left:96px; margin-bottom:4px; }
+  .tp-hdr{ flex:1; display:flex; gap:6px; }
+  .tp-hdr .th{ flex:1; text-align:center; font-size:11px; color:var(--muted); }
+  .pp-row{ display:flex; align-items:center; gap:8px; margin-bottom:6px; }
+  .pp-label{ width:88px; font-size:12px; color:var(--muted); text-align:right; line-height:1.2; }
+  .pp-label b{ color:var(--text); }
+  .row-cells{ flex:1; display:flex; gap:6px; border:2px solid transparent; border-radius:10px; padding:3px; transition:border-color .3s, box-shadow .3s; }
+  .row-cells.ring-a{ border-color:var(--purple); box-shadow:0 0 12px rgba(188,140,255,.25); }
+  .row-cells.ring-b{ border-color:var(--cyan); box-shadow:0 0 12px rgba(86,212,221,.25); }
+  .row-cells.ring-bad{ border-color:var(--red); box-shadow:0 0 12px rgba(248,81,73,.3); }
+  .cell{
+    flex:1; height:54px; border-radius:8px; background:#0d1117; border:1px solid var(--line);
+    display:flex; flex-direction:column; align-items:center; justify-content:center; gap:1px;
+    transition:background .3s, border-color .3s, transform .15s, box-shadow .15s; position:relative;
+  }
+  .cell .v{ font-size:18px; font-weight:800; color:var(--muted); transition:color .3s; }
+  .cell .v small{ font-size:9px; font-weight:500; }
+  .cell .rk{ font-size:9px; color:#5b6470; }
+  .cell.varied .v{ color:var(--amber); }
+  .cell.tpmin{ background:linear-gradient(180deg,#143055,#102844); border-color:var(--blue); }
+  .cell.tpmin .v{ color:#cfe6ff; }
+  .cell.gmin{ background:linear-gradient(180deg,#0f3a1d,#0e2c18); border-color:var(--green); }
+  .cell.gmin .v{ color:#c4f7d4; }
+  .cell.sweep{ transform:translateY(-3px); box-shadow:0 0 14px rgba(88,166,255,.55); border-color:var(--blue); }
+  .cell.bad{ background:linear-gradient(180deg,#3a1414,#2a1010); border-color:var(--red); }
+  .cell.bad .v{ color:#ffd4d0; }
+  .cell.dim{ opacity:.35; }
+  /* thread dots */
+  .tdots{ display:flex; gap:5px; margin-top:1px; }
+  .td{ width:9px; height:9px; border-radius:50%; border:1px solid var(--line); background:#0d1117; transition:all .25s; }
+  .td.a{ border-color:var(--purple); }
+  .td.b{ border-color:var(--cyan); }
+  .td.a.on{ background:var(--purple); box-shadow:0 0 8px var(--purple); }
+  .td.b.on{ background:var(--cyan); box-shadow:0 0 8px var(--cyan); }
+  .td.done{ background:var(--green); border-color:var(--green); box-shadow:0 0 6px var(--green); }
+  .td.dead{ background:var(--red); border-color:var(--red); box-shadow:0 0 6px var(--red); }
+
+  /* pp-group footer (columns) */
+  .pp-foot{ display:flex; align-items:center; gap:8px; margin-top:6px; }
+  .pp-foot .lab{ width:88px; font-size:11px; color:var(--muted); text-align:right; }
+  .pp-foot .cols{ flex:1; display:flex; gap:6px; padding:0 3px; }
+  .pp-foot .col{ flex:1; height:20px; border-radius:6px; border:1px dashed var(--line); font-size:9px;
+    color:#5b6470; display:flex; align-items:center; justify-content:center; transition:all .3s; }
+  .pp-foot .col.ring-a{ border-color:var(--purple); color:#e3d3ff; }
+  .pp-foot .col.ring-b{ border-color:var(--cyan); color:#cdf6fa; }
+  .pp-foot .col.ring-bad{ border-color:var(--red); color:#ffd4d0; }
+
+  /* shared tree (tab1) */
+  .tree-box{ margin-top:14px; display:flex; flex-direction:column; align-items:center; }
+  .tree-title{ font-size:12px; color:var(--muted); margin-bottom:6px; }
+  .tree{ display:flex; flex-direction:column; align-items:center; gap:4px; min-height:30px; }
+  .tnode{ width:220px; height:20px; border-radius:5px; background:#0d1117; border:1px solid var(--line);
+    display:flex; align-items:center; justify-content:center; font-size:11px; color:var(--muted);
+    opacity:0; transform:translateY(-6px); transition:opacity .25s, transform .25s; }
+  .tnode.show{ opacity:1; transform:none; background:linear-gradient(90deg,#0f3a1d,#196b32); border-color:var(--green); color:#c4f7d4; }
+
+  /* groups panel (tab2) */
+  .groups{ display:flex; gap:16px; justify-content:center; margin-top:12px; flex-wrap:wrap; }
+  .grp{ border:1px dashed var(--line); border-radius:10px; padding:8px 14px; font-size:12px; color:var(--muted);
+    min-width:230px; text-align:center; transition:all .3s; }
+  .grp b{ color:var(--text); }
+  .grp.g1.hot{ border-color:var(--purple); color:#e3d3ff; box-shadow:0 0 14px rgba(188,140,255,.25); }
+  .grp.g2.hot{ border-color:var(--cyan); color:#cdf6fa; box-shadow:0 0 14px rgba(86,212,221,.25); }
+  .banner{ text-align:center; font-weight:800; font-size:18px; min-height:24px; margin-top:8px; }
+  .banner.ok{ color:var(--green); } .banner.bad{ color:var(--red); }
+  .note{ font-size:12px; color:var(--muted); text-align:center; margin-top:4px; }
+
+  /* ---------- tab3: async skew x MIN ---------- */
+  .t3-section{ background:var(--panel2); border:1px solid var(--line); border-radius:12px; padding:10px 12px; margin-bottom:14px; }
+  .t3-title{ font-size:13px; margin-bottom:8px; display:flex; align-items:center; gap:8px; }
+  .t3-title .tag{ font-size:10px; padding:2px 8px; border-radius:999px; border:1px solid var(--line); color:var(--muted); }
+  .t3-title .tag.gpu{ border-color:var(--blue); color:#cfe6ff; }
+  .t3-title .tag.cpu{ border-color:var(--amber); color:#ffe2ab; }
+  /* top pipeline */
+  .pipe{ position:relative; }
+  .lane{ position:relative; height:34px; margin:6px 0; border-radius:8px; background:#0d1117;
+    border:1px solid var(--line); overflow:hidden; }
+  .lane .lname{ position:absolute; left:8px; top:50%; transform:translateY(-50%); font-size:11px; color:var(--muted); z-index:3; }
+  .mb{ position:absolute; top:6px; height:22px; width:52px; border-radius:6px; z-index:2;
+    display:flex; align-items:center; justify-content:center; font-size:10px; font-weight:700;
+    opacity:0; color:#c4f7d4; border:1px solid var(--green);
+    background:linear-gradient(180deg,#0f3a1d,#15311f); transition:opacity .18s; }
+  .lane-hint{ position:absolute; right:10px; top:50%; transform:translateY(-50%); font-size:10px; color:#46505e; z-index:1; }
+  .flow-note{ font-size:11px; color:var(--green); text-align:right; margin-top:2px; }
+
+  /* bottom sync area */
+  .sync-area{ position:relative; height:170px; border-radius:8px; background:#0d1117; border:1px solid var(--line); overflow:hidden; }
+  .barrier{ position:absolute; top:6px; bottom:6px; width:0; border-left:2px dashed var(--amber); z-index:2; }
+  .barrier .blabel{ position:absolute; top:-2px; left:8px; font-size:11px; color:var(--amber); white-space:nowrap; }
+  .barrier.fire{ border-left-color:var(--green); box-shadow:-2px 0 18px rgba(63,185,80,.5); }
+  .slane{ position:absolute; left:0; right:0; height:1px; border-top:1px dashed #1f2733; }
+  .slabel{ position:absolute; left:8px; font-size:11px; color:var(--muted); z-index:3; transform:translateY(-50%); }
+  .pkt{ position:absolute; height:30px; width:108px; border-radius:8px; z-index:3; transform:translateY(-50%);
+    display:flex; align-items:center; justify-content:center; gap:6px; font-size:11px; font-weight:700;
+    border:1px solid var(--line); background:#11161f; transition:background .25s, border-color .25s, box-shadow .25s; }
+  .pkt .hv{ font-size:14px; }
+  .pkt.travel{ border-color:var(--blue); color:#cfe6ff; }
+  .pkt.wait{ border-color:var(--amber); color:#ffe2ab; background:#1d1808; animation:pulse 1s infinite; }
+  .pkt.unified{ border-color:var(--green); color:#c4f7d4; background:linear-gradient(180deg,#0f3a1d,#0e2c18); box-shadow:0 0 12px rgba(63,185,80,.35); }
+  .clock{ position:absolute; right:10px; top:8px; font-size:11px; color:var(--muted); z-index:4; }
+  /* causal arrows + batch formers */
+  .t3-conduit{ position:relative; height:38px; margin:-2px 0 8px; display:flex; align-items:center; justify-content:center; }
+  .t3-conduit .clabel{ font-size:12px; color:var(--muted); transition:color .3s; z-index:2; background:var(--panel); padding:0 8px; }
+  .t3-conduit.hot .clabel{ color:var(--green); }
+  .t3-conduit::before{ content:""; position:absolute; left:50%; top:4px; bottom:4px; width:2px; transform:translateX(-50%);
+    background:repeating-linear-gradient(to top, #2b3340 0 6px, transparent 6px 12px); }
+  .t3-conduit.hot::before{ background:repeating-linear-gradient(to top, var(--green) 0 6px, transparent 6px 12px); opacity:.5; }
+  .t3-conduit .spark{ position:absolute; left:50%; bottom:3px; width:11px; height:11px; border-radius:50%;
+    background:var(--green); opacity:0; box-shadow:0 0 12px var(--green); transform:translateX(-50%); z-index:3; }
+  .t3-conduit.hot .spark{ animation:rise .85s linear infinite; }
+  .t3-conduit.hot .spark.s2{ animation-delay:.42s; }
+  @keyframes rise{ 0%{opacity:0; transform:translate(-50%,0);} 15%{opacity:1;} 100%{opacity:0; transform:translate(-50%,-34px);} }
+  .pipe.fed .lane{ border-color:var(--green); box-shadow:0 0 10px rgba(63,185,80,.25); }
+  .formers{ display:flex; gap:14px; justify-content:center; flex-wrap:wrap; }
+  .former{ flex:1; min-width:240px; max-width:320px; background:#0d1117; border:1px solid var(--line);
+    border-radius:10px; padding:10px 12px; transition:border-color .3s, box-shadow .3s; }
+  .former h5{ margin:0 0 6px; font-size:13px; }
+  .former .hitbox{ font-size:12px; color:var(--muted); margin-bottom:8px; }
+  .former .hitbox b{ color:var(--amber); font-size:14px; }
+  .former.ready{ border-color:var(--green); box-shadow:0 0 12px rgba(63,185,80,.2); }
+  .former.ready .hitbox b{ color:var(--green); }
+  .mbrow{ display:flex; gap:6px; }
+  .mbchip{ flex:1; height:24px; border-radius:6px; background:#11161f; border:1px solid var(--line);
+    display:flex; align-items:center; justify-content:center; font-size:11px; color:var(--muted); opacity:.3; transition:all .25s; }
+  .mbchip.on{ opacity:1; background:linear-gradient(180deg,#143055,#102844); border-color:var(--blue); color:#cfe6ff; }
+  .mbchip.fixed{ opacity:1; background:linear-gradient(180deg,#0f3a1d,#0e2c18); border-color:var(--green); color:#c4f7d4; }
+  .former .chk{ font-size:12px; color:var(--green); margin-top:6px; min-height:16px; }
+
+  /* ---------- tab4: thread relationships ---------- */
+  .t4wrap{ display:flex; gap:18px; flex-wrap:wrap; }
+  .t4flow{ flex:2; min-width:340px; display:flex; flex-direction:column; align-items:stretch; gap:0; }
+  .t4why{ flex:1; min-width:280px; display:flex; flex-direction:column; gap:10px; }
+  .tbox{ border:1px solid var(--line); border-radius:10px; padding:10px 12px; background:#0d1117; position:relative; }
+  .tbox .tname{ font-size:14px; font-weight:700; display:flex; align-items:center; gap:8px; }
+  .tbox .tdesc{ font-size:11.5px; color:var(--muted); margin-top:3px; }
+  .tbox .pin{ font-size:10px; padding:1px 7px; border-radius:999px; border:1px solid var(--line); color:var(--muted); }
+  .tbox.thread-hit{ border-left:4px solid var(--purple); }
+  .tbox.thread-io{ border-left:4px solid var(--blue); }
+  .tbox.thread-sync{ border-left:4px solid var(--cyan); }
+  .tbox.sched{ border-left:4px solid var(--muted); background:#11161f; }
+  .tarrow{ text-align:center; color:var(--muted); font-size:11px; padding:5px 0; position:relative; }
+  .tarrow b{ color:var(--text); }
+  .minnode{ align-self:center; margin:6px 0; border:1.5px solid var(--amber); border-radius:999px;
+    padding:7px 16px; font-size:12px; color:#ffe2ab; background:#1d1808; font-weight:600; }
+  .minnode.g2{ border-color:var(--green); color:#c4f7d4; background:#0f2a18; }
+  .minnode small{ display:block; font-size:10px; color:var(--muted); font-weight:400; }
+  .whycard{ border:1px solid var(--line); border-radius:10px; padding:11px 13px; background:var(--panel2); }
+  .whycard h4{ margin:0 0 6px; font-size:13px; }
+  .whycard p{ margin:0; font-size:12px; color:var(--muted); }
+  .whycard.a{ border-left:4px solid var(--purple); }
+  .whycard.b{ border-left:4px solid var(--cyan); }
+  .whycard.c{ border-left:4px solid var(--green); }
+  .whycard code{ font-size:11px; }
+
+  /* ---------- tab4 animation states ---------- */
+  #scene4 .tbox,#scene4 .tarrow,#scene4 .minnode,#scene4 .whycard{ transition:all .3s; }
+  #scene4 .dimmed{ opacity:.3; }
+  .tbox.lit{ box-shadow:0 0 18px rgba(88,166,255,.45); transform:translateX(5px); }
+  .tbox.thread-hit.lit{ box-shadow:0 0 18px rgba(188,140,255,.55); }
+  .tbox.thread-io.lit{ box-shadow:0 0 18px rgba(88,166,255,.55); }
+  .tbox.thread-sync.lit{ box-shadow:0 0 18px rgba(86,212,221,.55); }
+  .tbox.sched.lit{ box-shadow:0 0 18px rgba(63,185,80,.4); }
+  .tarrow.lit{ color:var(--green); font-weight:700; }
+  .tarrow.lit b{ color:var(--green); }
+  .tarrow.lit::after{ content:" ●"; color:var(--green); animation:t4blink .7s infinite; }
+  @keyframes t4blink{ 0%,100%{opacity:.2;} 50%{opacity:1;} }
+  .minnode.lit{ box-shadow:0 0 22px rgba(210,153,34,.65); transform:scale(1.06); }
+  .minnode.g2.lit{ box-shadow:0 0 22px rgba(63,185,80,.65); }
+  .whycard.lit{ box-shadow:0 0 18px rgba(63,185,80,.35); transform:translateY(-3px); border-left-width:6px; }
+
+  /* ---------- tab5: two-request full lifecycle story ---------- */
+  #scene5 .story-top{ display:flex; gap:14px; align-items:stretch; margin-bottom:14px; }
+  .gpu-badge{ width:120px; border:1px solid var(--line); border-radius:12px; background:#0d1117;
+    display:flex; flex-direction:column; align-items:center; justify-content:center; gap:2px;
+    font-size:12px; color:var(--muted); transition:all .3s; }
+  .gpu-badge .ic{ font-size:22px; }
+  .gpu-badge.busy{ border-color:var(--blue); color:#cfe6ff; box-shadow:0 0 18px rgba(88,166,255,.45);
+    background:linear-gradient(180deg,#143055,#102844); animation:gpupulse 1s infinite; }
+  @keyframes gpupulse{ 0%,100%{box-shadow:0 0 12px rgba(88,166,255,.3);} 50%{box-shadow:0 0 22px rgba(88,166,255,.6);} }
+  .l3box{ flex:1; border:1px solid var(--line); border-radius:12px; background:#0d1117; padding:8px 12px; transition:all .3s; }
+  .l3box.hot{ border-color:var(--green); box-shadow:0 0 14px rgba(63,185,80,.25); }
+  .l3title{ font-size:12px; color:var(--muted); margin-bottom:6px; display:flex; justify-content:space-between; align-items:center; }
+  .l3title .badge{ font-size:10px; padding:1px 8px; border-radius:999px; border:1px solid var(--line); color:var(--muted); min-width:46px; text-align:center; }
+  .l3title .badge.miss{ border-color:var(--red); color:#ffd4d0; }
+  .l3title .badge.hit{ border-color:var(--green); color:#c4f7d4; }
+  .pagerow{ display:flex; gap:6px; flex-wrap:wrap; min-height:28px; }
+  .pg{ width:38px; height:26px; border-radius:6px; border:1px solid var(--line); background:#11161f;
+    display:flex; align-items:center; justify-content:center; font-size:11px; color:#5b6470;
+    transition:all .3s; opacity:0; transform:scale(.6); }
+  .pg.show{ opacity:1; transform:none; }
+  .pg.l3{ border-color:var(--green); color:#c4f7d4; background:linear-gradient(180deg,#0f3a1d,#0e2c18); }
+
+  #scene5 .ranks{ display:flex; flex-direction:column; gap:10px; }
+  .ranklane{ border:1px solid var(--line); border-radius:12px; padding:8px 12px; background:var(--panel2); transition:all .3s; }
+  .ranklane.active{ border-color:var(--blue); box-shadow:0 0 12px rgba(88,166,255,.22); }
+  .rankhdr{ display:flex; align-items:center; gap:10px; margin-bottom:6px; font-size:12px; }
+  .rankhdr .rname{ font-weight:700; color:var(--text); }
+  .rankhdr .tps{ display:flex; gap:4px; }
+  .rankhdr .tp{ width:8px; height:8px; border-radius:50%; background:#11161f; border:1px solid var(--line); transition:all .25s; }
+  .rankhdr .tp.on{ background:var(--blue); border-color:var(--blue); box-shadow:0 0 6px var(--blue); }
+  .rankhdr .rstat{ margin-left:auto; font-size:11px; padding:1px 9px; border-radius:999px; border:1px solid var(--line); color:var(--muted); }
+  .rankhdr .rstat.miss{ border-color:var(--red); color:#ffd4d0; }
+  .rankhdr .rstat.hit{ border-color:var(--green); color:#c4f7d4; }
+  .rankhdr .rstat.warn{ border-color:var(--amber); color:#ffe2ab; }
+  .htree{ display:flex; align-items:center; gap:8px; min-height:28px; }
+  .htree .root{ font-size:10px; color:var(--muted); padding:2px 7px; border:1px dashed var(--line); border-radius:6px; }
+  .htnode{ width:40px; height:26px; border-radius:6px; border:1px solid var(--line); background:#11161f;
+    display:flex; align-items:center; justify-content:center; font-size:11px; color:#5b6470; position:relative;
+    transition:all .35s; opacity:0; transform:translateY(-6px) scale(.7); }
+  .htnode::before{ content:""; position:absolute; left:-8px; top:50%; width:8px; height:1px; background:var(--line); }
+  .htnode:first-of-type::before{ display:none; }
+  .htnode.show{ opacity:1; transform:none; }
+  .htnode.committed{ border-color:var(--green); color:#c4f7d4; background:linear-gradient(180deg,#0f3a1d,#0e2c18); }
+  .htnode.inserting{ border-color:var(--blue); color:#cfe6ff; background:linear-gradient(180deg,#143055,#102844); }
+  .htnode.matched{ border-color:var(--cyan); color:#cdf6fa; box-shadow:0 0 8px rgba(86,212,221,.4); }
+  .htnode.warn{ border-color:var(--amber); color:#ffe2ab; background:#1d1808; }
+  .htnode.evict{ border-color:var(--red); color:#ffd4d0; background:linear-gradient(180deg,#3a1414,#2a1010); }
+  .story-sync{ display:flex; gap:14px; justify-content:center; margin:14px 0 6px; flex-wrap:wrap; }
+  .syncbadge{ font-size:12px; padding:6px 14px; border-radius:999px; border:1.5px solid var(--line); color:var(--muted); transition:all .3s; }
+  .syncbadge.g1.fire{ border-color:var(--amber); color:#ffe2ab; background:#1d1808; box-shadow:0 0 16px rgba(210,153,34,.45); transform:scale(1.04); }
+  .syncbadge.g2.fire{ border-color:var(--green); color:#c4f7d4; background:#0f2a18; box-shadow:0 0 16px rgba(63,185,80,.45); transform:scale(1.04); }
+  .consist-flag{ text-align:center; font-weight:700; font-size:14px; min-height:20px; margin-top:4px; }
+  .consist-flag.ok{ color:var(--green); } .consist-flag.bad{ color:var(--red); }
+
+  /* ---------- tab6: PrefetchAck alignment & anti-hang ---------- */
+  #scene6 .ackmesh{ display:flex; flex-direction:column; gap:8px; margin-bottom:6px; }
+  .ackrow{ display:flex; align-items:center; gap:10px; border:1px solid var(--line); border-radius:10px; padding:6px 10px; background:var(--panel2); transition:all .3s; }
+  .ackrow.blocked{ border-color:var(--red); box-shadow:0 0 12px rgba(248,81,73,.3); }
+  .acklabel{ width:190px; font-size:12px; color:var(--muted); }
+  .acklabel b{ color:var(--text); }
+  .ackslots{ flex:1; display:flex; gap:8px; }
+  .ackchip{ flex:1; height:34px; border-radius:8px; border:1px solid var(--line); background:#0d1117;
+    display:flex; align-items:center; justify-content:center; font-size:11px; color:#5b6470; gap:5px;
+    transition:all .3s; position:relative; }
+  .ackchip .err{ font-size:9px; color:var(--red); }
+  .ackchip.pending{ opacity:.4; }
+  .ackchip.emit{ border-color:var(--blue); color:#cfe6ff; background:linear-gradient(180deg,#143055,#102844); box-shadow:0 0 10px rgba(88,166,255,.4); }
+  .ackchip.passed{ border-color:var(--green); color:#c4f7d4; background:linear-gradient(180deg,#0f3a1d,#0e2c18); }
+  .ackchip.wait{ border-color:var(--amber); color:#ffe2ab; background:#1d1808; animation:pulse 1s infinite; }
+  .ackchip.missing{ border-color:var(--red); border-style:dashed; color:#ffd4d0; background:#2a1010; }
+  .barriers{ border:1px dashed var(--line); border-radius:10px; padding:8px 10px; margin-top:4px; }
+  .barlabel{ font-size:12px; color:var(--muted); margin-bottom:6px; text-align:center; }
+  .barrow{ display:flex; gap:8px; align-items:stretch; }
+  .barrow .barspacer{ width:190px; }
+  .barcols{ flex:1; display:flex; gap:8px; }
+  .bar{ flex:1; border:1.5px solid var(--line); border-radius:8px; padding:6px 4px; text-align:center;
+    font-size:11px; color:var(--muted); transition:all .3s; }
+  .bar .bcount{ display:block; font-size:13px; font-weight:800; margin-top:2px; color:#5b6470; }
+  .bar.waiting{ border-color:var(--amber); color:#ffe2ab; background:#1d1808; animation:pulse 1s infinite; }
+  .bar.waiting .bcount{ color:#ffe2ab; }
+  .bar.fired{ border-color:var(--green); color:#c4f7d4; background:#0f2a18; }
+  .bar.fired .bcount{ color:#c4f7d4; }
+  .bar.dead{ border-color:var(--red); color:#ffd4d0; background:#2a1010; box-shadow:0 0 12px rgba(248,81,73,.4); }
+  .bar.dead .bcount{ color:#ffd4d0; }
+</style>
+</head>
+<body>
+<button id="langBtn" onclick="toggleLang()">EN</button>
+<header>
+  <h1>HiCache × Pipeline Parallel：树一致性 & 防死锁</h1>
+  <p>拓扑 <b>PP=3 × TP=8 = 24 ranks</b> · 行=TP 组、列=PP 组 · MIN all-reduce 保证 radix tree 一致 · 2 套 gloo 组避免后台 collective 死锁</p>
+</header>
+
+<div class="tabs">
+  <button class="tab active" data-tab="story">① 两请求全流程（L3 命中/未命中 · host tree 一致）</button>
+  <button class="tab" data-tab="consistency">② 树一致性（自动播放）</button>
+  <button class="tab" data-tab="deadlock">③ 为什么 2 个组不死锁</button>
+  <button class="tab" data-tab="skew">④ 异步时间差 × MIN 统一步调</button>
+  <button class="tab" data-tab="threads">⑤ 线程关系 & 树一致性</button>
+  <button class="tab" data-tab="ackalign">⑥ PrefetchAck 对齐 & 防 hang</button>
+</div>
+
+<div class="wrap">
+  <!-- ============ TAB 5 (story, shown first) ============ -->
+  <div class="scene" id="scene5">
+    <div class="legend">
+      <span class="chip"><span class="sw" style="background:var(--blue)"></span>GPU 计算 / 插入中</span>
+      <span class="chip"><span class="sw" style="background:var(--cyan)"></span>match 命中前缀</span>
+      <span class="chip"><span class="sw" style="background:var(--amber)"></span>各 rank 不一致（待 MIN 统一）</span>
+      <span class="chip"><span class="sw" style="background:var(--green)"></span>已提交 / 一致</span>
+      <span class="chip"><span class="sw" style="background:var(--red)"></span>未命中 / 淘汰删除</span>
+    </div>
+    <div class="story-top">
+      <div class="gpu-badge" id="gpuBadge"><span class="ic">▣</span><span>GPU 计算</span></div>
+      <div class="l3box" id="l3box">
+        <div class="l3title"><span class="l3lab"><b>L3 持久化存储</b>（storage backend，3 个 rank 共享视图）</span><span class="badge" id="l3badge"></span></div>
+        <div class="pagerow" id="l3pages"></div>
+      </div>
+    </div>
+    <div class="ranks" id="ranks"></div>
+    <div class="story-sync">
+      <div class="syncbadge g1" id="s5sync1">◆ MIN 组1 · prefetch_hits_sync_groups · storage_hit_count</div>
+      <div class="syncbadge g2" id="s5sync2">◆ MIN 组2 · prefetch_completion_sync_groups · completed_tokens</div>
+    </div>
+    <div class="consist-flag" id="s5flag"></div>
+    <div class="caption" id="cap5">自动播放中…</div>
+  </div>
+  <div class="controls" id="ctl5">
+    <button class="ctl primary" id="play5">⏸ 暂停</button>
+    <button class="ctl" id="replay5">⟲ 重播</button>
+  </div>
+
+  <!-- ============ TAB 1 ============ -->
+  <div class="scene hidden" id="scene1">
+    <div class="legend">
+      <span class="chip"><span class="sw" style="background:var(--amber)"></span>命中数被截断（不一致）</span>
+      <span class="chip"><span class="sw" style="background:var(--blue)"></span>TP 组内 MIN 后</span>
+      <span class="chip"><span class="sw" style="background:var(--green)"></span>PP 组内 MIN 后（全局一致）</span>
+    </div>
+    <div id="mesh1"></div>
+    <div class="tree-box">
+      <div class="tree-title" id="treeTitle">所有 24 个 rank 共享同一棵 radix tree</div>
+      <div class="tree" id="sharedTree"></div>
+    </div>
+    <div class="caption" id="cap1">自动播放中…</div>
+  </div>
+  <div class="controls">
+    <button class="ctl primary" id="play1">⏸ 暂停</button>
+    <button class="ctl" id="replay1">⟲ 重播</button>
+  </div>
+
+  <!-- ============ TAB 2 ============ -->
+  <div class="scene hidden" id="scene2">
+    <div class="legend">
+      <span class="chip"><span class="sw" style="background:var(--purple)"></span><b>prefetch_thread</b>（独立后台线程）· reduce(storage_hit_count)</span>
+      <span class="chip"><span class="sw" style="background:var(--cyan)"></span><b>prefetch_sync_thread</b>（独立后台线程）· reduce(completed_tokens)</span>
+    </div>
+    <div class="note" style="margin-bottom:8px;">每个 cell = 1 个 rank，内含 2 个独立后台线程（小圆点 ●A ●B）。每一行是一个 <b>TP communicator</b>，每一列是一个 <b>PP communicator</b>。</div>
+    <div id="mesh2"></div>
+    <div class="groups">
+      <div class="grp g1" id="g1"><b>prefetch_hits_sync_groups</b><br>命中页数归约组（含 TP 环 + PP 环）<br><span style="font-size:11px">reduce(storage_hit_count)</span></div>
+      <div class="grp g2" id="g2"><b>prefetch_completion_sync_groups</b><br>完成 token 归约组（含 TP 环 + PP 环）<br><span style="font-size:11px">reduce(completed_tokens)</span></div>
+    </div>
+    <div class="banner" id="banner2"></div>
+    <div class="caption" id="cap2">选择场景：用 <b>1 套组</b> 会死锁，用 <b>2 套组</b> 则安全。</div>
+  </div>
+  <div class="controls" id="ctl2">
+    <button class="ctl alt" id="play1grp">▶ 1 套组（死锁）</button>
+    <button class="ctl primary" id="play2grp">▶ 2 套组（安全）</button>
+    <button class="ctl" id="reset2">重置</button>
+  </div>
+
+  <!-- ============ TAB 3 ============ -->
+  <div class="scene hidden" id="scene3">
+    <!-- layer 3: pipeline timing (top) -->
+    <div class="t3-section">
+      <div class="t3-title">③ 主 PP 流水线执行<strong>时序</strong> <span class="tag gpu">NCCL · GPU</span>
+        <span style="color:var(--muted);font-size:11px;">时序连续、错峰流动，<strong style="color:var(--green)">不被后台 prefetch 同步打断</strong></span>
+      </div>
+      <div class="pipe" id="pipe">
+        <div class="lane s0"><span class="lname">PP stage 0</span><span class="lane-hint" id="hint0"></span></div>
+        <div class="lane s1"><span class="lname">PP stage 1</span><span class="lane-hint" id="hint1"></span></div>
+        <div class="lane s2"><span class="lname">PP stage 2</span><span class="lane-hint" id="hint2"></span></div>
+      </div>
+      <div class="flow-note">↑ 流水线跑的正是②组好的 <strong>mb0→mb3</strong>，沿 stage0→1→2 错峰对角推进</div>
+    </div>
+
+    <div class="t3-conduit" id="arrow1">
+      <span class="clabel">▲ 组好的 <b>batch &amp; micro-batch 顺序</b> 喂给流水线（内容）</span>
+      <span class="spark"></span><span class="spark s2"></span>
+    </div>
+
+    <!-- layer 2: batch former (middle) -->
+    <div class="t3-section">
+      <div class="t3-title">② 三个 PP rank 用<strong>同一个 storage hit</strong> 组 batch（内容必须逐 rank 一致）</div>
+      <div class="formers" id="formers"></div>
+    </div>
+
+    <div class="t3-conduit" id="arrow2">
+      <span class="clabel">▲ <code>all_reduce(MIN)</code> 输出统一值 <b>6</b> → 决定 batch size</span>
+      <span class="spark"></span><span class="spark s2"></span>
+    </div>
+
+    <!-- layer 1: async prefetch + MIN (bottom = causal source) -->
+    <div class="t3-section">
+      <div class="t3-title">① 异步 prefetch 查询 → <code>all_reduce(MIN)</code> <span class="tag cpu">gloo · CPU 后台线程</span></div>
+      <div class="sync-area" id="syncArea">
+        <div class="clock" id="t3clock">t = 0.0s</div>
+        <div class="barrier" id="barrier" style="left:62%"><span class="blabel">all_reduce(MIN)</span></div>
+        <div class="slabel" id="sl0">rank pp0</div><div class="pkt travel" id="pkt0"><span>pp0 storage hit</span><span class="hv">8</span></div>
+        <div class="slabel" id="sl1">rank pp1</div><div class="pkt travel" id="pkt1"><span>pp1 storage hit</span><span class="hv">6</span></div>
+        <div class="slabel" id="sl2">rank pp2</div><div class="pkt travel" id="pkt2"><span>pp2 storage hit</span><span class="hv">7</span></div>
+      </div>
+    </div>
+    <div class="caption" id="cap3">自动播放中…</div>
+  </div>
+  <div class="controls" id="ctl3">
+    <button class="ctl primary" id="play3">⏸ 暂停</button>
+    <button class="ctl" id="replay3">⟲ 重播</button>
+  </div>
+
+  <!-- ============ TAB 4 ============ -->
+  <div class="scene hidden" id="scene4">
+    <div class="t4wrap">
+      <!-- left: thread data-flow -->
+      <div class="t4flow">
+        <div class="tbox sched" id="t4b0">
+          <div class="tname">调度器 Scheduler <span class="pin">主线程</span></div>
+          <div class="tdesc">发起 prefetch 请求（writeback / load）</div>
+        </div>
+        <div class="tarrow" id="t4a0">▼ <b>prefetch_queue</b>（PrefetchOperation）</div>
+
+        <div class="tbox thread-hit" id="t4b1">
+          <div class="tname">① prefetch_thread <span class="pin">storage-hit 线程</span></div>
+          <div class="tdesc"><code>_storage_hit_query()</code> 查询 L3 命中页数；命中足够→放 prefetch_buffer，不足→prefetch_revoke_queue</div>
+        </div>
+        <div class="minnode" id="t4m1">◆ all_reduce(MIN) storage_hit_count
+          <small>@ prefetch_hits_sync_groups（组1，gloo/CPU，含 TP 环 + PP 环）</small></div>
+        <div class="tarrow" id="t4a1">▼ <b>prefetch_buffer</b></div>
+
+        <div class="tbox thread-io" id="t4b2">
+          <div class="tname">② prefetch_io_aux_thread <span class="pin">IO 加载线程</span></div>
+          <div class="tdesc"><code>_page_transfer()</code> 逐 batch 把页从 L3 读入 host；累加 <b>completed_tokens</b>；<b>每个 storage batch 产生 1 个 PrefetchAck</b>（出错也照常产生）</div>
+        </div>
+        <div class="tarrow" id="t4a2">▼ <b>prefetch_sync_queue</b>（PrefetchAck）</div>
+
+        <div class="tbox thread-sync" id="t4b3">
+          <div class="tname">③ prefetch_sync_thread <span class="pin">completion-token 线程</span></div>
+          <div class="tdesc">对每个 ack 的 <b>completed_tokens</b> 做归约</div>
+        </div>
+        <div class="minnode g2" id="t4m2">◆ all_reduce(MIN) completed_tokens
+          <small>@ prefetch_completion_sync_groups（组2，gloo/CPU，含 TP 环 + PP 环）</small></div>
+        <div class="tarrow" id="t4a3">▼ <b>ack_prefetch_queue</b></div>
+
+        <div class="tbox sched" id="t4b4">
+          <div class="tname">调度器写入 host radix tree</div>
+          <div class="tdesc">只插入 <b>completed_tokens</b> 长度的前缀 → <code>_insert_helper_host()</code></div>
+        </div>
+      </div>
+
+      <!-- right: why consistent -->
+      <div class="t4why">
+        <div class="whycard a" id="t4wa">
+          <h4>为什么 MIN(storage_hit) 一致？</h4>
+          <p>各 rank 命中可能不同（host 内存截断、L3 视图差异）。MIN 取<b>最长公共可命中前缀</b> → 所有 rank <b>抓取范围一致</b>，不会各抓不同长度。</p>
+        </div>
+        <div class="whycard b" id="t4wb">
+          <h4>为什么 MIN(completed_tokens) 一致？</h4>
+          <p>即便抓取范围一致，实际逐页加载仍可能<b>部分失败</b>（<code>page_get</code> 返回 n≠batch）。MIN 只提交<b>所有 rank 都成功落盘的最长公共前缀</b> → 写入 host tree 的长度逐 rank 相同。</p>
+        </div>
+        <div class="whycard c" id="t4wc">
+          <h4>为什么不会 hang？</h4>
+          <p>每个 storage batch <b>都产生且仅产生一个 PrefetchAck</b>（即使出错也照常产生）→ 每个 rank 参与的 reduce <b>次数严格相等</b>，collective 一一对齐。两个 MIN 一起保证：<b>插入 host tree 的前缀逐 rank 完全相同 → 树一致</b>。</p>
+        </div>
+      </div>
+    </div>
+    <div class="caption" id="cap4">两个 MIN 同步点（组1 命中数、组2 完成数）+ 每 batch 恒定 1 个 ack，共同保证 PP 各 rank 的 host radix tree 严格一致。</div>
+  </div>
+  <div class="controls" id="ctl4">
+    <button class="ctl primary" id="play4">⏸ 暂停</button>
+    <button class="ctl" id="replay4">⟲ 重播</button>
+  </div>
+
+  <!-- ============ TAB 6 : PrefetchAck alignment & anti-hang ============ -->
+  <div class="scene hidden" id="scene6">
+    <div class="note" style="margin-bottom:10px;">每个 <b>storage batch</b> 在 <code>_page_transfer</code> 里恒产 <b>1 个 PrefetchAck</b>；<code>prefetch_sync_thread</code> 对<strong>每个 ack</strong> 在组2 做一次 <code>all_reduce(MIN)</code>。所以 <b>ack 数 = batch 数 = 组2 collective 次数</b>，必须逐 rank 相等。</div>
+    <div class="legend">
+      <span class="chip"><span class="sw" style="background:var(--blue)"></span>ack 已产出（参与本轮 reduce）</span>
+      <span class="chip"><span class="sw" style="background:var(--green)"></span>barrier 凑齐 3/3 → 通过</span>
+      <span class="chip"><span class="sw" style="background:var(--amber)"></span>已到达，等待缺席方</span>
+      <span class="chip"><span class="sw" style="background:var(--red)"></span>缺失 ack → 永远等不到</span>
+    </div>
+    <div class="ackmesh" id="ackmesh"></div>
+    <div class="barriers">
+      <div class="barlabel">◆ <code>all_reduce(MIN)</code> @ 组2（prefetch_completion_sync_groups）· 每个 ack 一次 barrier</div>
+      <div class="barrow"><div class="barspacer"></div><div class="barcols" id="barcols"></div></div>
+    </div>
+    <div class="banner" id="banner6"></div>
+    <div class="caption" id="cap6">选择场景：<b>每 batch 恒 1 ack</b> → 次数对齐、安全；<b>出错就 break</b> → ack 缺一个 → 组2 reduce 错位 → hang。</div>
+  </div>
+  <div class="controls" id="ctl6">
+    <button class="ctl primary" id="play6good">▶ 正确（每 batch 恒 1 ack）</button>
+    <button class="ctl alt" id="play6bad">▶ 错误（出错 break → ack 缺失）</button>
+    <button class="ctl" id="reset6">重置</button>
+  </div>
+</div>
+
+<script>
+const PP=3, TP=8;
+// initial hit counts per (pp,tp). some truncated by host-mem pressure.
+const HITS=[
+  [8,8,8,8,8,8,8,8],
+  [8,8,6,8,8,8,8,7],   // stage1: rank(1,2)=6, rank(1,7)=7 truncated
+  [8,8,8,8,8,8,8,8],
+];
+const ROWMIN = HITS.map(r=>Math.min(...r));      // [8,6,8]
+const GMIN = Math.min(...ROWMIN);                // 6
+const sleep=ms=>new Promise(r=>setTimeout(r,ms));
+
+/* ---------- build a mesh ---------- */
+function buildMesh(containerId, withThreads){
+  const c=document.getElementById(containerId);
+  let html='<div class="mesh-head"><div class="tp-hdr">';
+  for(let t=0;t<TP;t++) html+=`<div class="th">TP ${t}</div>`;
+  html+='</div></div>';
+  for(let p=0;p<PP;p++){
+    html+=`<div class="pp-row"><div class="pp-label"><b>PP stage ${p}</b><br>(TP 组)</div><div class="row-cells" id="${containerId}-row${p}">`;
+    for(let t=0;t<TP;t++){
+      html+=`<div class="cell" id="${containerId}-c${p}-${t}">
+        <div class="v">—</div>
+        <div class="rk">r${p*TP+t}</div>
+        ${withThreads?`<div class="tdots"><span class="td a" id="${containerId}-A-${p}-${t}"></span><span class="td b" id="${containerId}-B-${p}-${t}"></span></div>`:''}
+      </div>`;
+    }
+    html+='</div></div>';
+  }
+  // pp-group footer (columns)
+  html+='<div class="pp-foot"><div class="lab">PP 组(列)<br>每列跨 3 个 stage →</div><div class="cols">';
+  for(let t=0;t<TP;t++) html+=`<div class="col" id="${containerId}-col${t}">r${t}·r${t+TP}·r${t+2*TP}</div>`;
+  html+='</div></div>';
+  c.innerHTML=html;
+}
+const cell=(m,p,t)=>document.getElementById(`${m}-c${p}-${t}`);
+const val =(m,p,t)=>cell(m,p,t).querySelector('.v');
+
+buildMesh('mesh1',false);
+buildMesh('mesh2',true);
+
+/* ============================================================
+   TAB 1 : auto-play loop
+   query -> diverge -> TP all_reduce(MIN) -> PP all_reduce(MIN) -> consistent tree
+   ============================================================ */
+let t1Token=0, t1Paused=false;
+const cap1=document.getElementById('cap1');
+const sharedTree=document.getElementById('sharedTree');
+
+function resetMesh1(){
+  for(let p=0;p<PP;p++) for(let t=0;t<TP;t++){
+    const cl=cell('mesh1',p,t); cl.className='cell';
+    val('mesh1',p,t).innerHTML='—';
+  }
+  sharedTree.innerHTML='';
+}
+async function gate(my){ while(t1Paused){ await sleep(120); if(my!==t1Token) throw 0; } }
+async function step(ms,my){ await sleep(ms); await gate(my); if(my!==t1Token) throw 0; }
+
+async function runTab1(){
+  const my=++t1Token;
+  try{
+    while(true){
+      resetMesh1();
+      cap1.innerHTML='拓扑 <b>PP=3 × TP=8 = 24 个 rank</b>：每个 PP stage 下挂 8 个 TP rank。';
+      await step(1600,my);
+
+      // 1) independent query
+      for(let p=0;p<PP;p++) for(let t=0;t<TP;t++){
+        const v=HITS[p][t]; const cl=cell('mesh1',p,t);
+        val('mesh1',p,t).innerHTML=v+' <small>pg</small>';
+        if(v!==8) cl.classList.add('varied');
+        await sleep(35);
+      }
+      cap1.innerHTML='① 各 rank <span class="k">独立</span>向 L3 查询前缀命中。<b style="color:var(--amber)">注意 r10、r15 因 host 内存压力被截断</b>（6 / 7 页）。';
+      await step(2200,my);
+
+      // 2) diverge warning
+      cap1.innerHTML='② 若各 rank 按自己的命中数建 radix tree → 树高不一致 → 后续 PP 集合通信 <b style="color:var(--red)">shape mismatch → crash</b>。';
+      for(let p=0;p<PP;p++) for(let t=0;t<TP;t++) if(HITS[p][t]!==8){ cell('mesh1',p,t).classList.add('bad'); }
+      await step(2200,my);
+      for(let p=0;p<PP;p++) for(let t=0;t<TP;t++) cell('mesh1',p,t).classList.remove('bad','varied');
+
+      // 3) TP all_reduce(MIN) — sweep each row (all rows in parallel)
+      cap1.innerHTML='③ 第一步：在 <span class="k">TP 组（每一行 8 个 rank）</span>内 <code>all_reduce(MIN)</code>。';
+      for(let t=0;t<TP;t++){
+        for(let p=0;p<PP;p++) cell('mesh1',p,t).classList.add('sweep');
+        await step(110,my);
+        for(let p=0;p<PP;p++) cell('mesh1',p,t).classList.remove('sweep');
+      }
+      for(let p=0;p<PP;p++) for(let t=0;t<TP;t++){
+        const cl=cell('mesh1',p,t); cl.classList.add('tpmin');
+        val('mesh1',p,t).innerHTML=ROWMIN[p]+' <small>pg</small>';
+      }
+      cap1.innerHTML='③ TP 组归约后：<b>每一行变得一致</b>（PP0=8, PP1=6, PP2=8 = 各行最小值）。';
+      await step(1900,my);
+
+      // 4) PP all_reduce(MIN) — sweep each column (top->bottom)
+      cap1.innerHTML='④ 第二步：在 <span class="k">PP 组（每一列 3 个 rank）</span>内 <code>all_reduce(MIN)</code> → 收敛到全局最小值。';
+      for(let p=0;p<PP;p++){
+        for(let t=0;t<TP;t++) cell('mesh1',p,t).classList.add('sweep');
+        await step(180,my);
+        for(let t=0;t<TP;t++) cell('mesh1',p,t).classList.remove('sweep');
+      }
+      for(let p=0;p<PP;p++) for(let t=0;t<TP;t++){
+        const cl=cell('mesh1',p,t); cl.classList.remove('tpmin'); cl.classList.add('gmin');
+        val('mesh1',p,t).innerHTML=GMIN+' <small>pg</small>';
+      }
+      cap1.innerHTML='④ PP 组归约后：<b style="color:var(--green)">全部 24 个 rank 命中数 = 6</b>（最长公共前缀）。';
+      await step(1700,my);
+
+      // 5) shared consistent tree
+      cap1.innerHTML='⑤ 所有 rank 都只 prefetch / 建树到 6 → <span style="color:var(--green)">24 个 rank 的 radix tree 完全一致 ✓</span>';
+      sharedTree.innerHTML='';
+      for(let i=0;i<GMIN;i++){
+        const n=document.createElement('div'); n.className='tnode'; n.textContent='page '+i; sharedTree.appendChild(n);
+        await step(120,my); n.classList.add('show');
+      }
+      await step(2600,my);
+    }
+  }catch(e){ /* cancelled */ }
+}
+
+document.getElementById('play1').onclick=function(){
+  t1Paused=!t1Paused;
+  this.textContent=t1Paused?'▶ 播放':'⏸ 暂停';
+};
+document.getElementById('replay1').onclick=()=>{ t1Paused=false; document.getElementById('play1').textContent='⏸ 暂停'; runTab1(); };
+
+/* ============================================================
+   TAB 2 : why 2 group-sets avoid deadlock (PP×TP mesh)
+   ============================================================ */
+let t2Token=0;
+const cap2=document.getElementById('cap2');
+const banner2=document.getElementById('banner2');
+const g1=document.getElementById('g1'), g2=document.getElementById('g2');
+const row2=p=>document.getElementById('mesh2-row'+p);
+const col2=t=>document.getElementById('mesh2-col'+t);
+const dotEl=(op,p,t)=>document.getElementById(`mesh2-${op}-${p}-${t}`);
+
+function resetMesh2(){
+  ++t2Token;
+  for(let p=0;p<PP;p++){
+    row2(p).className='row-cells';
+    for(let t=0;t<TP;t++){
+      const cl=cell('mesh2',p,t); cl.className='cell';
+      val('mesh2',p,t).innerHTML=GMIN+' <small>pg</small>';
+      dotEl('A',p,t).className='td a'; dotEl('B',p,t).className='td b';
+    }
+  }
+  for(let t=0;t<TP;t++) col2(t).className='col';
+  g1.classList.remove('hot'); g2.classList.remove('hot'); g2.style.opacity=1;
+  banner2.className='banner'; banner2.textContent='';
+}
+async function s2(ms,my){ await sleep(ms); if(my!==t2Token) throw 0; }
+
+/* ---- 1 shared group set -> deadlock ---- */
+async function play1Group(){
+  resetMesh2(); const my=t2Token;
+  try{
+    g2.style.opacity=.25; g1.classList.add('hot');
+    cap2.innerHTML='只有 <b>1 套组</b>：prefetch_thread(A) 与 prefetch_sync_thread(B) 共用同一个 communicator 集。';
+    await s2(800,my);
+
+    // each rank's two threads race: some submit A first (purple), some B first (cyan)
+    cap2.innerHTML='两个后台线程<b>独立调度、顺序不定</b>：同一个 TP 环里，有的 rank 先发 A，有的先发 B。';
+    for(let p=0;p<PP;p++){
+      for(let t=0;t<TP;t++){
+        const aFirst=((p+t)%2===0);      // deterministic but mixed within each row
+        const first=aFirst?'A':'B';
+        dotEl(first,p,t).classList.add('on');
+      }
+    }
+    await s2(1200,my);
+
+    // rings cannot align -> red
+    cap2.innerHTML='同一个 communicator 上各 rank 提交的 collective <b style="color:var(--red)">不是同一个</b>（A 与 B 错位）→ rendezvous 永远配不上。';
+    for(let p=0;p<PP;p++){
+      row2(p).classList.add('ring-bad');
+      for(let t=0;t<TP;t++){
+        cell('mesh2',p,t).classList.add('bad');
+        const aFirst=((p+t)%2===0);
+        dotEl(aFirst?'A':'B',p,t).className=`td ${aFirst?'a':'b'} dead`;
+      }
+    }
+    for(let t=0;t<TP;t++) col2(t).classList.add('ring-bad');
+    await s2(700,my);
+    banner2.className='banner bad'; banner2.textContent='💥 DEADLOCK — 整个 24-rank job 卡死';
+    cap2.innerHTML='只要任一 communicator 上 A/B 交错，该环就死锁 → 全局 PP/TP 通信连环卡住。';
+  }catch(e){}
+}
+
+/* ---- 2 group sets -> safe ---- */
+async function play2Groups(){
+  resetMesh2(); const my=t2Token;
+  try{
+    g1.classList.add('hot'); g2.classList.add('hot');
+    cap2.innerHTML='用 <b>2 套独立组</b>：<b style="color:var(--purple)">A 永远走 prefetch_hits_sync_groups</b>，<b style="color:var(--cyan)">B 永远走 prefetch_completion_sync_groups</b>。';
+    await s2(800,my);
+
+    // wave A: every rank's prefetch_thread submits A to group-set-1; TP rings + PP rings light purple
+    cap2.innerHTML='第一波：所有 rank 的 <b>prefetch_thread</b> 只在 <code>prefetch_hits_sync_groups</code> 上提交 A → 序列一致。';
+    for(let p=0;p<PP;p++){
+      row2(p).classList.add('ring-a');
+      for(let t=0;t<TP;t++) dotEl('A',p,t).classList.add('on');
+    }
+    for(let t=0;t<TP;t++) col2(t).classList.add('ring-a');
+    await s2(1100,my);
+    for(let p=0;p<PP;p++){
+      row2(p).className='row-cells';
+      for(let t=0;t<TP;t++) dotEl('A',p,t).className='td a done';
+    }
+    for(let t=0;t<TP;t++) col2(t).className='col';
+    cap2.innerHTML='✓ TP 环 + PP 环上 A 全部到齐 → 第一波归约完成。';
+    await s2(900,my);
+
+    // wave B: prefetch_sync_thread submits B to group-set-2; rings light cyan
+    cap2.innerHTML='第二波：所有 rank 的 <b>prefetch_sync_thread</b> 只在 <code>prefetch_completion_sync_groups</code> 上提交 B → 序列一致。';
+    for(let p=0;p<PP;p++){
+      row2(p).classList.add('ring-b');
+      for(let t=0;t<TP;t++) dotEl('B',p,t).classList.add('on');
+    }
+    for(let t=0;t<TP;t++) col2(t).classList.add('ring-b');
+    await s2(1100,my);
+    for(let p=0;p<PP;p++){
+      row2(p).className='row-cells';
+      for(let t=0;t<TP;t++){ dotEl('B',p,t).className='td b done'; cell('mesh2',p,t).classList.add('gmin'); }
+    }
+    for(let t=0;t<TP;t++) col2(t).className='col';
+    banner2.className='banner ok'; banner2.textContent='✅ 安全 — 24 个 rank 全部对齐完成';
+    cap2.innerHTML='每个 communicator 上的 collective 序列在所有 rank <b style="color:var(--green)">完全一致</b>（A→组1、B→组2 不交叉）→ 不会死锁。';
+  }catch(e){}
+}
+
+document.getElementById('play1grp').onclick=play1Group;
+document.getElementById('play2grp').onclick=play2Groups;
+document.getElementById('reset2').onclick=()=>{ resetMesh2(); cap2.innerHTML='选择场景：用 <b>1 套组</b> 会死锁，用 <b>2 套组</b> 则安全。'; };
+
+/* ============================================================
+   TAB 3 : async PP skew  x  all_reduce(MIN) unifies pace
+   Top: continuous, skewed micro-batch pipeline (CSS infinite) — never pauses.
+   Bottom: time-driven (rAF). Each rank's prefetch op arrives at the MIN barrier
+   at a DIFFERENT wall-clock time (async skew). Early arrivals park & wait on the
+   gloo CPU group (background thread). When the slowest arrives, one MIN flash
+   unifies all three to 6 and they depart together — while the top pipeline keeps
+   flowing untouched.
+   ============================================================ */
+const NMB=4;                          // micro-batches per batch (illustrative)
+const MB_LABELS=Array.from({length:NMB},(_,k)=>'mb'+k);
+// build top pipeline micro-batches: one controllable block per (stage, mb)
+(function buildPipe(){
+  for(let s=0;s<3;s++){
+    const lane=document.querySelector('#pipe .s'+s);
+    for(let k=0;k<NMB;k++){
+      const mb=document.createElement('div');
+      mb.className='mb'; mb.id=`pmb-${s}-${k}`; mb.textContent=MB_LABELS[k];
+      lane.appendChild(mb);
+    }
+  }
+})();
+// build batch formers (one per PP rank): hit input + ordered mb chips
+(function buildFormers(){
+  const host=document.getElementById('formers');
+  for(let p=0;p<3;p++){
+    const f=document.createElement('div'); f.className='former'; f.id='former'+p;
+    let chips='';
+    for(let k=0;k<NMB;k++) chips+=`<div class="mbchip" id="fchip-${p}-${k}">${MB_LABELS[k]}</div>`;
+    f.innerHTML=`<h5>调度器 · PP rank ${p}</h5>
+      <div class="hitbox"><span class="ht1">已缓存前缀 storage hit = </span><b id="fhit${p}">？</b><span class="ht2"> 页 → 决定 batch 组成</span></div>
+      <div class="mbrow">${chips}</div>
+      <div class="chk" id="fchk${p}"></div>`;
+    host.appendChild(f);
+  }
+})();
+const arrow1=document.getElementById('arrow1');
+const arrow2=document.getElementById('arrow2');
+
+const PKT=[
+  { el:document.getElementById('pkt0'), lab:document.getElementById('sl0'), y:34,  hit:8, arrive:2.6 },
+  { el:document.getElementById('pkt1'), lab:document.getElementById('sl1'), y:85,  hit:6, arrive:1.9 }, // arrives first
+  { el:document.getElementById('pkt2'), lab:document.getElementById('sl2'), y:136, hit:7, arrive:3.9 }, // slowest -> everyone waits
+];
+const GMIN3=Math.min(...PKT.map(p=>p.hit)); // 6
+const T3={ START:0.4, SYNC:3.9, FLASH_END:4.5, DEPART_END:5.4,
+           DROP:4.5, MB_START:5.0, MB_STEP:0.35, READY:6.4, CYCLE:14.0,
+           PIPE_START:6.4, MB_LAG:1.0, STAGE_LAG:0.9, MB_TRAVEL:3.2,
+           X0:11, XB:62, X1:97 }; // seconds / percentages
+const cap3=document.getElementById('cap3');
+const barrier=document.getElementById('barrier');
+const t3clock=document.getElementById('t3clock');
+let t3raf=null, t3on=false, t3paused=false, t3start=0, t3lastT=0;
+
+function placePkt(p){ p.lab.style.top=p.y+'px'; p.el.style.top=p.y+'px'; }
+PKT.forEach(placePkt);
+function lerp(a,b,u){ return a+(b-a)*Math.max(0,Math.min(1,u)); }
+// returns progress 0..1 if t (or its wrapped form) is inside [st,en), else -1
+function pipeActive(st,en,t){
+  if(t>=st && t<en) return (t-st)/(en-st);
+  const tw=t+T3.CYCLE;
+  if(tw>=st && tw<en) return (tw-st)/(en-st);   // previous batch still draining across loop
+  return -1;
+}
+
+function resetFormers(){
+  arrow1.classList.remove('hot'); arrow2.classList.remove('hot');
+  for(let p=0;p<3;p++){
+    document.getElementById('fhit'+p).textContent='？';
+    document.getElementById('former'+p).classList.remove('ready');
+    document.getElementById('fchk'+p).textContent='';
+    for(let k=0;k<NMB;k++) document.getElementById(`fchip-${p}-${k}`).className='mbchip';
+  }
+}
+
+function t3frame(now){
+  if(!t3on){ return; }
+  if(!t3paused){
+    let t=((now - t3start)/1000) % T3.CYCLE;
+    t3lastT=t;
+    t3clock.textContent='t = '+t.toFixed(1)+'s';
+    barrier.classList.toggle('fire', (t>=T3.SYNC && t<T3.FLASH_END));
+
+    // --- layer 1: async packets toward MIN barrier ---
+    PKT.forEach(p=>{
+      let xPct, cls;
+      if(t < T3.START){ xPct=T3.X0; cls='travel'; }
+      else if(t < p.arrive){ xPct=lerp(T3.X0, T3.XB, (t-T3.START)/(p.arrive-T3.START)); cls='travel'; }
+      else if(t < T3.FLASH_END){ xPct=T3.XB; cls='wait'; }
+      else if(t < T3.DEPART_END){ xPct=lerp(T3.XB, T3.X1, (t-T3.FLASH_END)/(T3.DEPART_END-T3.FLASH_END)); cls='unified'; }
+      else { xPct=T3.X1; cls='unified'; }
+      p.el.style.left='calc('+xPct+'% - 54px)';
+      p.el.className='pkt '+cls;
+      p.el.querySelector('.hv').textContent = (t>=T3.SYNC? GMIN3 : p.hit);
+    });
+
+    // --- layer 2: batch formers driven by unified value ---
+    if(t < T3.FLASH_END){
+      resetFormers();
+    } else {
+      arrow2.classList.add('hot');                 // MIN -> value out
+      for(let p=0;p<3;p++) document.getElementById('fhit'+p).textContent=GMIN3;
+      // light mb chips in identical order across all three formers
+      let lit=0;
+      for(let k=0;k<NMB;k++){
+        if(t >= T3.MB_START + k*T3.MB_STEP){
+          for(let p=0;p<3;p++) document.getElementById(`fchip-${p}-${k}`).classList.add('on');
+          lit++;
+        }
+      }
+      if(t >= T3.READY){
+        arrow1.classList.add('hot');               // batch -> pipeline content
+        for(let p=0;p<3;p++){
+          document.getElementById('former'+p).classList.add('ready');
+          document.getElementById('fchk'+p).textContent='✓ batch & mb 顺序一致';
+          for(let k=0;k<NMB;k++) document.getElementById(`fchip-${p}-${k}`).className='mbchip fixed';
+        }
+      } else {
+        arrow1.classList.remove('hot');
+      }
+    }
+
+    // --- layer 3: the formed mb0..mb3 flow through stage0->1->2 (diagonal pipeline) ---
+    for(let s=0;s<3;s++){
+      for(let k=0;k<NMB;k++){
+        const b=document.getElementById(`pmb-${s}-${k}`);
+        const st=T3.PIPE_START + k*T3.MB_LAG + s*T3.STAGE_LAG;
+        const u=pipeActive(st, st+T3.MB_TRAVEL, t);   // handles cycle wrap (previous batch still draining)
+        if(u>=0){ b.style.left=(19+u*73)+'%'; b.style.opacity=1; }
+        else b.style.opacity=0;
+      }
+      document.getElementById('hint'+s).textContent = (t>1.4 && t<T3.PIPE_START) ? '（等待②组好的 batch…）' : '';
+    }
+
+    // --- captions ---
+    if(t < T3.START) cap3.innerHTML='① 三个 PP rank 的 prefetch 查询<strong>异步发起</strong>（到达时刻不同）。';
+    else if(t < PKT[2].arrive) cap3.innerHTML='① 先到的 rank 在 <span class="k">gloo CPU 后台线程</span>上<b style="color:var(--amber)">等待对齐</b>（不占 GPU）。';
+    else if(t < T3.FLASH_END) cap3.innerHTML='① <b style="color:var(--amber)">pp2 最慢</b>到达 → <code>all_reduce(MIN)</code> 把 8/6/7 <strong style="color:var(--green)">统一成 6</strong>。';
+    else if(t < T3.READY) cap3.innerHTML='② 统一后的 <b>storage hit = 6</b> 下发给各 rank 调度器 → 决定<strong>已缓存前缀长度 / batch size / micro-batch 顺序</strong>（mb0→mb3）。';
+    else cap3.innerHTML='③ 三个 rank 因拿到<strong style="color:var(--green)">同一个 6</strong> 而组出<strong style="color:var(--green)">完全一致的 batch 与 mb 顺序</strong>，喂给 PP 流水线；执行时序连续不被打断。<br><span style="color:var(--red)">⚠ 若 storage hit 不统一 → batch/mb 顺序逐 rank 发散 → PP 调度错位、卡死。</span>';
+  }
+  t3raf=requestAnimationFrame(t3frame);
+}
+function startTab3(restart){
+  t3on=true;
+  if(restart || !t3start){ t3start=performance.now(); t3paused=false; document.getElementById('play3').textContent='⏸ 暂停'; }
+  document.getElementById('pipe').classList.remove('paused');
+  if(!t3raf) t3raf=requestAnimationFrame(t3frame);
+}
+function stopTab3(){ t3on=false; if(t3raf){ cancelAnimationFrame(t3raf); t3raf=null; } }
+
+document.getElementById('play3').onclick=function(){
+  t3paused=!t3paused;
+  if(t3paused){ t3start=performance.now() - t3lastT*1000; } // freeze
+  else { t3start=performance.now() - t3lastT*1000; }        // resume from frozen t
+  this.textContent=t3paused?'▶ 播放':'⏸ 暂停';
+  document.getElementById('pipe').classList.toggle('paused', t3paused);
+};
+document.getElementById('replay3').onclick=()=>startTab3(true);
+
+/* ============================================================
+   TAB 4 : animated walk-through of the prefetch thread pipeline.
+   Highlights each stage in sequence; the chain "lights up" as the
+   data (PrefetchOperation → Ack → completed_tokens) flows down, and
+   the matching right-side why-card glows at each MIN sync point.
+   ============================================================ */
+let t4Token=0, t4Paused=false;
+const cap4El=document.getElementById('cap4');
+// [ids-to-light, caption]
+const T4SEQ=[
+  [['t4b0'], '调度器主线程把 prefetch 请求（writeback / load）放入队列，触发后台流水线。'],
+  [['t4a0'], '<b>PrefetchOperation</b> 入队 <code>prefetch_queue</code>，交给后台线程处理。'],
+  [['t4b1'], '① <b>prefetch_thread</b> 调 <code>_storage_hit_query()</code> 查询 L3 命中页数（各 rank 可能不同）。'],
+  [['t4m1','t4wa'], '◆ <b style="color:var(--amber)">第一个 MIN</b>：在 <code>prefetch_hits_sync_groups</code>（组1）对 storage_hit_count 取最小 → <b>抓取范围逐 rank 一致</b>。'],
+  [['t4a1'], '命中足够的请求落入 <code>prefetch_buffer</code>，进入实际 IO 加载。'],
+  [['t4b2'], '② <b>prefetch_io_aux_thread</b> 用 <code>_page_transfer()</code> 逐 batch 把页 L3→host；<b>每个 batch 恒产生 1 个 PrefetchAck</b>（出错也产生）。'],
+  [['t4a2'], '每个 batch 的 <b>PrefetchAck</b> 入队 <code>prefetch_sync_queue</code>。'],
+  [['t4b3'], '③ <b>prefetch_sync_thread</b> 对每个 ack 的 <b>completed_tokens</b> 做归约。'],
+  [['t4m2','t4wb'], '◆ <b style="color:var(--green)">第二个 MIN</b>：在 <code>prefetch_completion_sync_groups</code>（组2）对 completed_tokens 取最小 → <b>真正落盘前缀逐 rank 一致</b>。'],
+  [['t4a3'], '统一后的结果入队 <code>ack_prefetch_queue</code> 回到调度器。'],
+  [['t4b4','t4wc'], '调度器只插入 <b>completed_tokens</b> 长度的前缀 → <code>_insert_helper_host()</code>。每 batch 恒 1 个 ack，<b>reduce 次数严格相等 → 不会 hang</b>。'],
+];
+function t4clear(){
+  document.querySelectorAll('#scene4 .lit').forEach(e=>e.classList.remove('lit'));
+  document.querySelectorAll('#scene4 .dimmed').forEach(e=>e.classList.remove('dimmed'));
+}
+async function t4gate(my){ while(t4Paused){ await sleep(120); if(my!==t4Token) throw 0; } }
+async function t4step(ms,my){ await sleep(ms); await t4gate(my); if(my!==t4Token) throw 0; }
+async function runTab4(){
+  const my=++t4Token;
+  try{
+    while(true){
+      t4clear();
+      cap4El.innerHTML='沿数据流向下逐步点亮：两个 MIN 同步点 + 每 batch 恒定 1 个 ack。';
+      await t4step(1200,my);
+      for(const [ids,cap] of T4SEQ){
+        await t4gate(my);
+        ids.forEach(id=>document.getElementById(id).classList.add('lit'));
+        cap4El.innerHTML=cap;
+        await t4step(1900,my);
+      }
+      cap4El.innerHTML='✅ 闭环：两个 MIN（组1 命中数 + 组2 完成数）+ 每 batch 1 个 ack → <b style="color:var(--green)">PP 各 rank 的 host radix tree 严格一致</b>。';
+      await t4step(2600,my);
+    }
+  }catch(e){ /* cancelled */ }
+}
+function stopTab4(){ ++t4Token; }
+
+document.getElementById('play4').onclick=function(){
+  t4Paused=!t4Paused;
+  this.textContent=t4Paused?'▶ 播放':'⏸ 暂停';
+};
+document.getElementById('replay4').onclick=()=>{ t4Paused=false; document.getElementById('play4').textContent='⏸ 暂停'; runTab4(); };
+
+/* ============================================================
+   TAB 5 : two-request full lifecycle (PP=3 × TP=4).
+   Req A misses (GPU compute → L2 insert → L3 backup), then L2 is
+   evicted (delete, identical across ranks); Req B hits L3 and goes
+   through the two MIN syncs so every PP rank inserts the SAME prefix
+   into its host radix tree → trees stay consistent, no deadlock.
+   ============================================================ */
+const NPG=4;                       // pages tracked in the story
+const RANK_NAMES=['PP rank 0','PP rank 1','PP rank 2'];
+(function buildStory(){
+  let h='';
+  for(let p=0;p<3;p++){
+    let tps=''; for(let t=0;t<4;t++) tps+=`<span class="tp" id="s5tp-${p}-${t}"></span>`;
+    let nodes='<span class="root">host root</span>';
+    for(let i=0;i<NPG;i++) nodes+=`<div class="htnode" id="s5n-${p}-${i}">p${i}</div>`;
+    h+=`<div class="ranklane" id="s5lane${p}">
+      <div class="rankhdr"><span class="rname">${RANK_NAMES[p]}</span><span class="tps">${tps}</span>
+        <span class="rstat" id="s5stat${p}">idle</span></div>
+      <div class="htree">${nodes}</div></div>`;
+  }
+  document.getElementById('ranks').innerHTML=h;
+  let l3=''; for(let i=0;i<NPG;i++) l3+=`<div class="pg" id="s5l3-${i}">p${i}</div>`;
+  document.getElementById('l3pages').innerHTML=l3;
+})();
+
+let t5Token=0, t5Paused=false;
+const cap5=document.getElementById('cap5');
+const s5n=(p,i)=>document.getElementById(`s5n-${p}-${i}`);
+const setNode=(p,i,cls)=>{ s5n(p,i).className='htnode show '+cls; };
+const hideNode=(p,i)=>{ s5n(p,i).className='htnode'; };
+function rstat(p,txt,cls){ const e=document.getElementById('s5stat'+p); e.className='rstat '+(cls||''); e.textContent=txt; }
+function s5flag(txt,cls){ const e=document.getElementById('s5flag'); e.className='consist-flag '+(cls||''); e.innerHTML=txt; }
+function s5reset(){
+  for(let p=0;p<3;p++){
+    document.getElementById('s5lane'+p).className='ranklane';
+    rstat(p,'idle','');
+    for(let t=0;t<4;t++) document.getElementById(`s5tp-${p}-${t}`).className='tp';
+    for(let i=0;i<NPG;i++) hideNode(p,i);
+  }
+  for(let i=0;i<NPG;i++) document.getElementById('s5l3-'+i).className='pg';
+  document.getElementById('gpuBadge').className='gpu-badge';
+  document.getElementById('l3box').className='l3box';
+  document.getElementById('l3badge').className='badge'; document.getElementById('l3badge').textContent='';
+  document.getElementById('s5sync1').className='syncbadge g1';
+  document.getElementById('s5sync2').className='syncbadge g2';
+  s5flag('','');
+}
+async function t5gate(my){ while(t5Paused){ await sleep(120); if(my!==t5Token) throw 0; } }
+async function t5step(ms,my){ await sleep(ms); await t5gate(my); if(my!==t5Token) throw 0; }
+const allRanks=fn=>{ for(let p=0;p<3;p++) fn(p); };
+
+async function runTab5(){
+  const my=++t5Token;
+  try{
+    while(true){
+      s5reset();
+      cap5.innerHTML='场景 <b>PP=3 × TP=4</b>：每个 PP rank 维护一棵 <b>L2 host radix tree</b>，共享底层 <b>L3 持久化存储</b>。跟踪两个请求，看 host tree 如何保持一致。';
+      await t5step(2000,my);
+
+      /* ===== ACT 1 : Request A — miss → GPU → L2 insert → L3 backup ===== */
+      cap5.innerHTML='① <b>请求 A</b> 到达（需要 4 个 page 的前缀），3 个 PP rank 同时处理。';
+      allRanks(p=>{ document.getElementById('s5lane'+p).classList.add('active'); for(let t=0;t<4;t++) document.getElementById(`s5tp-${p}-${t}`).className='tp on'; rstat(p,'req A',''); });
+      await t5step(1700,my);
+
+      cap5.innerHTML='① 查 L2 host tree → <b style="color:var(--red)">miss</b>；查 L3 → <b style="color:var(--red)">miss</b>（存储为空）。';
+      allRanks(p=>rstat(p,'L2/L3 miss','miss'));
+      document.getElementById('l3badge').className='badge miss'; document.getElementById('l3badge').textContent='miss';
+      await t5step(1900,my);
+
+      cap5.innerHTML='① 回退到 <b>GPU 前向计算</b>，生成这 4 个 page 的 KV。';
+      document.getElementById('gpuBadge').classList.add('busy');
+      allRanks(p=>rstat(p,'compute','warn'));
+      await t5step(1800,my);
+      document.getElementById('gpuBadge').classList.remove('busy');
+
+      cap5.innerHTML='① 计算结果写入 <b>L2 host radix tree</b> → 3 个 rank <code>insert</code> <strong style="color:var(--green)">相同</strong>的前缀 p0–p3。';
+      for(let i=0;i<NPG;i++){ allRanks(p=>setNode(p,i,'inserting')); await t5step(240,my); }
+      allRanks(p=>{ for(let i=0;i<NPG;i++) setNode(p,i,'committed'); rstat(p,'L2: 4','hit'); });
+      s5flag('✓ 3 棵 host tree 同步插入 4 个 page（一致）','ok');
+      await t5step(1600,my);
+
+      cap5.innerHTML='① backup 线程把 L2 → <b>L3</b> 持久化（<code>write_backup</code> / <code>page_set</code>）。';
+      document.getElementById('l3box').classList.add('hot');
+      document.getElementById('l3badge').className='badge hit'; document.getElementById('l3badge').textContent='stored';
+      for(let i=0;i<NPG;i++){ document.getElementById('s5l3-'+i).className='pg show l3'; await t5step(220,my); }
+      s5flag('','');
+      await t5step(1500,my);
+
+      /* ===== ACT 1.5 : L2 eviction (delete consistency) ===== */
+      cap5.innerHTML='② host 内存压力 → L2 触发<strong style="color:var(--red)">淘汰</strong>（<code>evict_host</code>）。3 棵 host tree <b>完全一致</b> → 淘汰命中<strong>同一批节点</strong>；L3 仍保留。';
+      for(let i=NPG-1;i>=0;i--){ allRanks(p=>setNode(p,i,'evict')); await t5step(330,my); allRanks(p=>hideNode(p,i)); }
+      allRanks(p=>rstat(p,'L2 empty',''));
+      s5flag('✓ 3 棵 host tree 同步删除（delete 一致）','ok');
+      await t5step(2000,my);
+      s5flag('','');
+
+      /* ===== ACT 2 : Request B — L3 hit → 2 MIN syncs → consistent insert ===== */
+      cap5.innerHTML='③ <b>请求 B</b> 到达（复用 A 的前缀）。L2 host tree 已空 → <b style="color:var(--red)">L2 miss</b>，转向 L3。';
+      allRanks(p=>{ document.getElementById('s5lane'+p).classList.add('active'); rstat(p,'req B','warn'); });
+      await t5step(1700,my);
+
+      cap5.innerHTML='③ <b>prefetch_thread</b> 各 rank 向 L3 查命中页数 → 结果可能<strong style="color:var(--amber)">不同</strong>（host 视图/内存差异）：4 / 3 / 4。';
+      const hitq=[4,3,4];
+      allRanks(p=>{ rstat(p,'L3 hit '+hitq[p], hitq[p]===4?'hit':'warn'); for(let i=0;i<hitq[p];i++) setNode(p,i,'warn'); });
+      s5flag('⚠ 查询长度不一致（4/3/4）→ 若各自建树，host tree 会发散','bad');
+      await t5step(2600,my);
+
+      cap5.innerHTML='◆ <b>第一个 MIN</b> @ <code>prefetch_hits_sync_groups</code>（组1，gloo/CPU，含 TP环+PP环）：<code>all_reduce(MIN)</code> 统一查询长度 = <b>3</b>。';
+      document.getElementById('s5sync1').classList.add('fire');
+      await t5step(1500,my);
+      allRanks(p=>{ for(let i=0;i<NPG;i++) hideNode(p,i); for(let i=0;i<3;i++) setNode(p,i,'matched'); rstat(p,'match 3','hit'); });
+      s5flag('✓ 抓取范围统一 = 3 → match_prefix 逐 rank 一致','ok');
+      await t5step(2200,my);
+
+      cap5.innerHTML='③ <b>prefetch_io_aux_thread</b> 逐 batch 把 page 从 L3 拉回 L2（<code>_page_transfer</code>），每 batch 产 1 个 PrefetchAck。';
+      document.getElementById('s5sync1').classList.remove('fire');
+      for(let i=0;i<3;i++){ allRanks(p=>setNode(p,i,'inserting')); await t5step(300,my); }
+      await t5step(700,my);
+
+      cap5.innerHTML='③ 逐页加载<strong style="color:var(--amber)">部分失败</strong>：rank2 第 3 页 <code>page_get</code> 未成功 → completed_tokens = 3 / 3 / 2。';
+      const done=[3,3,2];
+      allRanks(p=>{ for(let i=0;i<3;i++){ if(i<done[p]) setNode(p,i,'committed'); else setNode(p,i,'warn'); } rstat(p,'done '+done[p], done[p]===3?'hit':'warn'); });
+      s5flag('⚠ 实际落盘不一致（3/3/2）→ 若各自插入，host tree 会发散','bad');
+      await t5step(2600,my);
+
+      cap5.innerHTML='◆ <b>第二个 MIN</b> @ <code>prefetch_completion_sync_groups</code>（组2，<b>独立 communicator</b>）：<code>all_reduce(MIN)</code> 统一 completed_tokens = <b>2</b>。';
+      document.getElementById('s5sync2').classList.add('fire');
+      await t5step(1500,my);
+
+      cap5.innerHTML='③ 各 rank 只把统一的 <b>2 个 page</b> 插入 L2 host tree（<code>_insert_helper_host</code>）→ 3 棵 host tree <strong style="color:var(--green)">再次完全一致</strong>。';
+      allRanks(p=>{ for(let i=0;i<NPG;i++) hideNode(p,i); for(let i=0;i<2;i++) setNode(p,i,'committed'); rstat(p,'L2: 2','hit'); });
+      s5flag('✓ 插入长度统一 = 2 → 3 棵 host tree 完全一致','ok');
+      await t5step(2400,my);
+
+      cap5.innerHTML='✅ 两套<strong>独立 gloo 组</strong>（组1 命中数、组2 完成数）+ 每 batch 恒 1 个 ack → 各 rank 对 host tree 的<strong>插入/删除完全一致</strong> → <b style="color:var(--green)">host radix tree 始终一致，后台 collective 不会死锁</b>。';
+      document.getElementById('s5sync1').classList.add('fire');
+      await t5step(3400,my);
+      document.getElementById('s5sync1').classList.remove('fire');
+      document.getElementById('s5sync2').classList.remove('fire');
+      await t5step(700,my);
+    }
+  }catch(e){ /* cancelled */ }
+}
+function stopTab5(){ ++t5Token; }
+
+document.getElementById('play5').onclick=function(){
+  t5Paused=!t5Paused;
+  this.textContent=t5Paused?'▶ 播放':'⏸ 暂停';
+};
+document.getElementById('replay5').onclick=()=>{ t5Paused=false; document.getElementById('play5').textContent='⏸ 暂停'; runTab5(); };
+
+/* ============================================================
+   TAB 6 : PrefetchAck count alignment & anti-hang.
+   Each storage batch in _page_transfer emits exactly one PrefetchAck,
+   and prefetch_sync_thread does one all_reduce(MIN) on set-2 per ack.
+   So #acks == #batches == #set-2 collectives, and it must be equal on
+   every rank. We compare: (good) each batch always emits 1 ack even on
+   error → counts aligned → safe; (bad) break-on-error drops an ack →
+   one rank does fewer reduces → the others block forever → hang.
+   ============================================================ */
+const T6NB=3;                 // number of storage batches / acks
+(function buildAck(){
+  let m='';
+  for(let p=0;p<3;p++){
+    let slots='';
+    for(let k=0;k<T6NB;k++) slots+=`<div class="ackchip pending" id="ack-${p}-${k}">ack${k}</div>`;
+    m+=`<div class="ackrow" id="ackrow${p}"><div class="acklabel"><b>PP rank ${p}</b> · _page_transfer</div><div class="ackslots">${slots}</div></div>`;
+  }
+  document.getElementById('ackmesh').innerHTML=m;
+  let b='';
+  for(let k=0;k<T6NB;k++) b+=`<div class="bar" id="bar${k}">barrier ${k}<span class="bcount" id="bcnt${k}">0/3</span></div>`;
+  document.getElementById('barcols').innerHTML=b;
+})();
+
+let t6Token=0;
+const cap6=document.getElementById('cap6');
+const banner6=document.getElementById('banner6');
+const ackEl=(p,k)=>document.getElementById(`ack-${p}-${k}`);
+const barEl=k=>document.getElementById('bar'+k);
+const bcnt=k=>document.getElementById('bcnt'+k);
+function t6reset(){
+  ++t6Token;
+  for(let p=0;p<3;p++){
+    document.getElementById('ackrow'+p).className='ackrow';
+    for(let k=0;k<T6NB;k++){ ackEl(p,k).className='ackchip pending'; ackEl(p,k).innerHTML=`ack${k}`; }
+  }
+  for(let k=0;k<T6NB;k++){ barEl(k).className='bar'; bcnt(k).textContent='0/3'; }
+  banner6.className='banner'; banner6.textContent='';
+}
+async function s6(ms,my){ await sleep(ms); if(my!==t6Token) throw 0; }
+
+// nacks: how many acks each rank produces (rank index -> count)
+async function playAck(nacks, label){
+  t6reset(); const my=t6Token;
+  try{
+    cap6.innerHTML=label;
+    await s6(700,my);
+    for(let k=0;k<T6NB;k++){
+      barEl(k).classList.add('waiting');
+      let arrived=0;
+      // ranks emit ack k one by one (async arrival)
+      for(let p=0;p<3;p++){
+        await s6(520,my);
+        if(k<nacks[p]){
+          ackEl(p,k).className='ackchip emit';
+          arrived++; bcnt(k).textContent=arrived+'/3';
+        }
+      }
+      await s6(400,my);
+      if(arrived===3){
+        barEl(k).className='bar fired'; bcnt(k).textContent='3/3 ✓';
+        for(let p=0;p<3;p++) ackEl(p,k).className='ackchip passed';
+        cap6.innerHTML=(window.TR||((z,e)=>z))(
+          `barrier ${k}：3 个 rank 的 ack 都到齐 → <code>all_reduce(MIN)</code> 返回 → 本轮通过。`,
+          `barrier ${k}: all 3 ranks' acks arrived → <code>all_reduce(MIN)</code> returns → this round passes.`);
+        await s6(700,my);
+      }else{
+        // a rank is missing this ack -> collective can never complete
+        barEl(k).className='bar dead'; bcnt(k).textContent=arrived+'/3 ✗';
+        for(let p=0;p<3;p++){
+          if(k<nacks[p]){ ackEl(p,k).className='ackchip wait'; document.getElementById('ackrow'+p).classList.add('blocked'); }
+          else { ackEl(p,k).className='ackchip missing'; ackEl(p,k).innerHTML=`ack${k}<span class="err">${(window.TR||((z,e)=>z))('缺失','missing')}</span>`; }
+        }
+        cap6.innerHTML=(window.TR||((z,e)=>z))(
+          `barrier ${k}：只有 <b>${arrived}/3</b> 个 rank 进了 <code>all_reduce</code>（有 rank 早早 break、少产一个 ack）→ 已到达的 rank <b style="color:var(--amber)">永远阻塞</b>在这次 collective 上。`,
+          `barrier ${k}: only <b>${arrived}/3</b> ranks entered <code>all_reduce</code> (a rank broke early and emitted one fewer ack) → the ranks that arrived are <b style="color:var(--amber)">blocked forever</b> on this collective.`);
+        banner6.className='banner bad'; banner6.textContent='💥 HANG：组2 reduce 次数不一致（3/3/2）→ collective 永不返回';
+        return;
+      }
+    }
+    banner6.className='banner ok'; banner6.textContent='✅ 安全：每 rank 都做了 '+T6NB+' 次 reduce，次数严格相等，全部对齐完成';
+    cap6.innerHTML='每个 batch 恒产 1 个 ack（出错也产）→ <b>ack 数逐 rank 相等</b> → 组2 的 collective 一一对应 → 不会 hang。';
+  }catch(e){ /* cancelled */ }
+}
+function stopTab6(){ ++t6Token; }
+document.getElementById('play6good').onclick=()=>playAck([3,3,3],
+  '<b style="color:var(--green)">正确</b>：即便某 batch 出错，<code>_page_transfer</code> 也<strong>继续循环、照常产 ack</strong> → 三个 rank 都产 3 个 ack。');
+document.getElementById('play6bad').onclick=()=>playAck([3,3,2],
+  '<b style="color:var(--red)">错误（反面教材）</b>：rank2 在 batch2 出错就 <code>break</code> → 只产 2 个 ack，比别人少一个。');
+document.getElementById('reset6').onclick=()=>{ t6reset(); cap6.innerHTML='选择场景：<b>每 batch 恒 1 ack</b> → 次数对齐、安全；<b>出错就 break</b> → ack 缺一个 → 组2 reduce 错位 → hang。'; };
+
+/* ---------- tab switching ---------- */
+const ctl1=document.querySelectorAll('.controls')[1]; // [0] is now ctl5 (story)
+const ctl2=document.getElementById('ctl2');
+const ctl3=document.getElementById('ctl3');
+const ctl4=document.getElementById('ctl4');
+const ctl5=document.getElementById('ctl5');
+const ctl6=document.getElementById('ctl6');
+document.querySelectorAll('.tab').forEach(tab=>{
+  tab.onclick=()=>{
+    document.querySelectorAll('.tab').forEach(x=>x.classList.remove('active'));
+    tab.classList.add('active');
+    const w=tab.dataset.tab;
+    document.getElementById('scene5').classList.toggle('hidden', w!=='story');
+    document.getElementById('scene1').classList.toggle('hidden', w!=='consistency');
+    document.getElementById('scene2').classList.toggle('hidden', w!=='deadlock');
+    document.getElementById('scene3').classList.toggle('hidden', w!=='skew');
+    document.getElementById('scene4').classList.toggle('hidden', w!=='threads');
+    document.getElementById('scene6').classList.toggle('hidden', w!=='ackalign');
+    ctl5.style.display = w==='story'?'flex':'none';
+    ctl1.style.display = w==='consistency'?'flex':'none';
+    ctl2.style.display = w==='deadlock'?'flex':'none';
+    ctl3.style.display = w==='skew'?'flex':'none';
+    ctl4.style.display = w==='threads'?'flex':'none';
+    ctl6.style.display = w==='ackalign'?'flex':'none';
+    if(w==='story'){ ++t1Token; stopTab3(); stopTab4(); stopTab6(); t5Paused=false; document.getElementById('play5').textContent='⏸ 暂停'; runTab5(); }
+    else if(w==='consistency'){ t1Paused=false; document.getElementById('play1').textContent='⏸ 暂停'; runTab1(); stopTab3(); stopTab4(); stopTab5(); stopTab6(); }
+    else if(w==='skew'){ ++t1Token; startTab3(true); stopTab4(); stopTab5(); stopTab6(); }
+    else if(w==='threads'){ ++t1Token; stopTab3(); stopTab5(); stopTab6(); t4Paused=false; document.getElementById('play4').textContent='⏸ 暂停'; runTab4(); }
+    else if(w==='ackalign'){ ++t1Token; stopTab3(); stopTab4(); stopTab5(); t6reset(); }
+    else{ ++t1Token; stopTab3(); stopTab4(); stopTab5(); stopTab6(); }
+  };
+});
+ctl1.style.display='none';
+ctl2.style.display='none';
+ctl3.style.display='none';
+ctl4.style.display='none';
+ctl6.style.display='none';
+resetMesh2();
+runTab5();
+</script>
+
+<script>
+/* ============================================================
+   i18n: translate by text-content (robust to innerHTML normalization).
+   PAIRS = [zhHTML, enHTML]; keys derived from stripped textContent.
+   A MutationObserver re-translates dynamic captions on the fly.
+   ============================================================ */
+(function(){
+  const PAIRS = [
+    // header
+    ['HiCache × Pipeline Parallel：树一致性 & 防死锁','HiCache × Pipeline Parallel: Tree Consistency & Deadlock Avoidance'],
+    ['拓扑 <b>PP=3 × TP=8 = 24 ranks</b> · 行=TP 组、列=PP 组 · MIN all-reduce 保证 radix tree 一致 · 2 套 gloo 组避免后台 collective 死锁',
+     'Topology <b>PP=3 × TP=8 = 24 ranks</b> · rows = TP groups, cols = PP groups · MIN all-reduce keeps radix trees identical · 2 gloo group-sets avoid background-collective deadlock'],
+    // tabs
+    ['① 两请求全流程（L3 命中/未命中 · host tree 一致）','① Two-Request Lifecycle (L3 miss/hit · host-tree consistency)'],
+    ['② 树一致性（自动播放）','② Tree Consistency (auto-play)'],
+    ['③ 为什么 2 个组不死锁','③ Why 2 Groups Avoid Deadlock'],
+    ['④ 异步时间差 × MIN 统一步调','④ Async Skew × MIN Lockstep'],
+    ['⑤ 线程关系 & 树一致性','⑤ Thread Relationships & Consistency'],
+    ['⑥ PrefetchAck 对齐 & 防 hang','⑥ PrefetchAck Alignment & Anti-Hang'],
+    // tab6 note / legend / barrier label / scenario captions / banners
+    ['每个 <b>storage batch</b> 在 <code>_page_transfer</code> 里恒产 <b>1 个 PrefetchAck</b>；<code>prefetch_sync_thread</code> 对<strong>每个 ack</strong> 在组2 做一次 <code>all_reduce(MIN)</code>。所以 <b>ack 数 = batch 数 = 组2 collective 次数</b>，必须逐 rank 相等。',
+     'Each <b>storage batch</b> in <code>_page_transfer</code> always emits <b>exactly 1 PrefetchAck</b>; <code>prefetch_sync_thread</code> does one <code>all_reduce(MIN)</code> on set 2 <strong>per ack</strong>. So <b>#acks = #batches = #set-2 collectives</b>, and it must be equal on every rank.'],
+    ['<span class="sw" style="background:var(--blue)"></span>ack 已产出（参与本轮 reduce）','<span class="sw" style="background:var(--blue)"></span>ack emitted (joins this reduce)'],
+    ['<span class="sw" style="background:var(--green)"></span>barrier 凑齐 3/3 → 通过','<span class="sw" style="background:var(--green)"></span>barrier reaches 3/3 → pass'],
+    ['<span class="sw" style="background:var(--amber)"></span>已到达，等待缺席方','<span class="sw" style="background:var(--amber)"></span>arrived, waiting for the absent rank'],
+    ['<span class="sw" style="background:var(--red)"></span>缺失 ack → 永远等不到','<span class="sw" style="background:var(--red)"></span>missing ack → never arrives'],
+    ['◆ <code>all_reduce(MIN)</code> @ 组2（prefetch_completion_sync_groups）· 每个 ack 一次 barrier',
+     '◆ <code>all_reduce(MIN)</code> @ set 2 (prefetch_completion_sync_groups) · one barrier per ack'],
+    ['选择场景：<b>每 batch 恒 1 ack</b> → 次数对齐、安全；<b>出错就 break</b> → ack 缺一个 → 组2 reduce 错位 → hang。',
+     'Pick a scenario: <b>one ack per batch</b> → counts aligned, safe; <b>break on error</b> → one ack missing → set-2 reduces misalign → hang.'],
+    ['<b style="color:var(--green)">正确</b>：即便某 batch 出错，<code>_page_transfer</code> 也<strong>继续循环、照常产 ack</strong> → 三个 rank 都产 3 个 ack。',
+     '<b style="color:var(--green)">Correct</b>: even if a batch errors, <code>_page_transfer</code> <strong>keeps looping and still emits the ack</strong> → all three ranks emit 3 acks.'],
+    ['<b style="color:var(--red)">错误（反面教材）</b>：rank2 在 batch2 出错就 <code>break</code> → 只产 2 个 ack，比别人少一个。',
+     '<b style="color:var(--red)">Wrong (anti-pattern)</b>: rank2 hits an error at batch2 and <code>break</code>s → emits only 2 acks, one fewer than the others.'],
+    ['▶ 正确（每 batch 恒 1 ack）','▶ Correct (one ack per batch)'],
+    ['▶ 错误（出错 break → ack 缺失）','▶ Wrong (break on error → missing ack)'],
+    ['每个 batch 恒产 1 个 ack（出错也产）→ <b>ack 数逐 rank 相等</b> → 组2 的 collective 一一对应 → 不会 hang。',
+     'Each batch always emits one ack (even on error) → <b>ack counts equal across ranks</b> → set-2 collectives match one-to-one → no hang.'],
+    ['✅ 安全：每 rank 都做了 3 次 reduce，次数严格相等，全部对齐完成','✅ Safe: every rank did 3 reduces, counts strictly equal, all aligned and complete'],
+    ['💥 HANG：组2 reduce 次数不一致（3/3/2）→ collective 永不返回','💥 HANG: set-2 reduce counts differ (3/3/2) → the collective never returns'],
+    // tab5 legend chips
+    ['<span class="sw" style="background:var(--blue)"></span>GPU 计算 / 插入中','<span class="sw" style="background:var(--blue)"></span>GPU compute / inserting'],
+    ['<span class="sw" style="background:var(--cyan)"></span>match 命中前缀','<span class="sw" style="background:var(--cyan)"></span>matched prefix'],
+    ['<span class="sw" style="background:var(--amber)"></span>各 rank 不一致（待 MIN 统一）','<span class="sw" style="background:var(--amber)"></span>diverged per rank (await MIN)'],
+    ['<span class="sw" style="background:var(--green)"></span>已提交 / 一致','<span class="sw" style="background:var(--green)"></span>committed / consistent'],
+    ['<span class="sw" style="background:var(--red)"></span>未命中 / 淘汰删除','<span class="sw" style="background:var(--red)"></span>miss / evicted'],
+    // tab5 static labels
+    ['<b>L3 持久化存储</b>（storage backend，3 个 rank 共享视图）',
+     '<b>L3 persistent storage</b> (storage backend, shared view across 3 ranks)'],
+    ['GPU 计算','GPU compute'],
+    ['◆ MIN 组1 · prefetch_hits_sync_groups · storage_hit_count','◆ MIN set 1 · prefetch_hits_sync_groups · storage_hit_count'],
+    ['◆ MIN 组2 · prefetch_completion_sync_groups · completed_tokens','◆ MIN set 2 · prefetch_completion_sync_groups · completed_tokens'],
+    // tab5 consistency flags
+    ['✓ 3 棵 host tree 同步插入 4 个 page（一致）','✓ all 3 host trees insert 4 pages in sync (consistent)'],
+    ['✓ 3 棵 host tree 同步删除（delete 一致）','✓ all 3 host trees delete in sync (consistent)'],
+    ['⚠ 查询长度不一致（4/3/4）→ 若各自建树，host tree 会发散','⚠ query lengths differ (4/3/4) → building trees independently diverges them'],
+    ['✓ 抓取范围统一 = 3 → match_prefix 逐 rank 一致','✓ fetch range unified = 3 → match_prefix identical per rank'],
+    ['⚠ 实际落盘不一致（3/3/2）→ 若各自插入，host tree 会发散','⚠ actual loads differ (3/3/2) → inserting independently diverges trees'],
+    ['✓ 插入长度统一 = 2 → 3 棵 host tree 完全一致','✓ insert length unified = 2 → all 3 host trees identical'],
+    // tab5 step captions
+    ['场景 <b>PP=3 × TP=4</b>：每个 PP rank 维护一棵 <b>L2 host radix tree</b>，共享底层 <b>L3 持久化存储</b>。跟踪两个请求，看 host tree 如何保持一致。',
+     'Scenario <b>PP=3 × TP=4</b>: each PP rank keeps an <b>L2 host radix tree</b> over a shared <b>L3 persistent storage</b>. We follow two requests and see how the trees stay consistent.'],
+    ['① <b>请求 A</b> 到达（需要 4 个 page 的前缀），3 个 PP rank 同时处理。',
+     '① <b>Request A</b> arrives (needs a 4-page prefix); all 3 PP ranks process it together.'],
+    ['① 查 L2 host tree → <b style="color:var(--red)">miss</b>；查 L3 → <b style="color:var(--red)">miss</b>（存储为空）。',
+     '① Query L2 host tree → <b style="color:var(--red)">miss</b>; query L3 → <b style="color:var(--red)">miss</b> (storage empty).'],
+    ['① 回退到 <b>GPU 前向计算</b>，生成这 4 个 page 的 KV。',
+     '① Fall back to <b>GPU forward compute</b> to produce the KV for these 4 pages.'],
+    ['① 计算结果写入 <b>L2 host radix tree</b> → 3 个 rank <code>insert</code> <strong style="color:var(--green)">相同</strong>的前缀 p0–p3。',
+     '① Results are written into the <b>L2 host radix tree</b> → all 3 ranks <code>insert</code> the <strong style="color:var(--green)">same</strong> prefix p0–p3.'],
+    ['① backup 线程把 L2 → <b>L3</b> 持久化（<code>write_backup</code> / <code>page_set</code>）。',
+     '① The backup thread persists L2 → <b>L3</b> (<code>write_backup</code> / <code>page_set</code>).'],
+    ['② host 内存压力 → L2 触发<strong style="color:var(--red)">淘汰</strong>（<code>evict_host</code>）。3 棵 host tree <b>完全一致</b> → 淘汰命中<strong>同一批节点</strong>；L3 仍保留。',
+     '② Host-memory pressure → L2 <strong style="color:var(--red)">eviction</strong> (<code>evict_host</code>). The 3 host trees are <b>identical</b> → eviction hits the <strong>same nodes</strong>; L3 keeps them.'],
+    ['③ <b>请求 B</b> 到达（复用 A 的前缀）。L2 host tree 已空 → <b style="color:var(--red)">L2 miss</b>，转向 L3。',
+     '③ <b>Request B</b> arrives (reuses A\u2019s prefix). The L2 host tree is empty → <b style="color:var(--red)">L2 miss</b>, fall through to L3.'],
+    ['③ <b>prefetch_thread</b> 各 rank 向 L3 查命中页数 → 结果可能<strong style="color:var(--amber)">不同</strong>（host 视图/内存差异）：4 / 3 / 4。',
+     '③ <b>prefetch_thread</b> on each rank queries L3 hit pages → results may <strong style="color:var(--amber)">differ</strong> (host view / memory): 4 / 3 / 4.'],
+    ['◆ <b>第一个 MIN</b> @ <code>prefetch_hits_sync_groups</code>（组1，gloo/CPU，含 TP环+PP环）：<code>all_reduce(MIN)</code> 统一查询长度 = <b>3</b>。',
+     '◆ <b>First MIN</b> @ <code>prefetch_hits_sync_groups</code> (set 1, gloo/CPU, TP+PP rings): <code>all_reduce(MIN)</code> unifies the query length = <b>3</b>.'],
+    ['③ <b>prefetch_io_aux_thread</b> 逐 batch 把 page 从 L3 拉回 L2（<code>_page_transfer</code>），每 batch 产 1 个 PrefetchAck。',
+     '③ <b>prefetch_io_aux_thread</b> pulls pages L3→L2 batch by batch (<code>_page_transfer</code>), one PrefetchAck per batch.'],
+    ['③ 逐页加载<strong style="color:var(--amber)">部分失败</strong>：rank2 第 3 页 <code>page_get</code> 未成功 → completed_tokens = 3 / 3 / 2。',
+     '③ Per-page load <strong style="color:var(--amber)">partially fails</strong>: rank2\u2019s 3rd page <code>page_get</code> fails → completed_tokens = 3 / 3 / 2.'],
+    ['◆ <b>第二个 MIN</b> @ <code>prefetch_completion_sync_groups</code>（组2，<b>独立 communicator</b>）：<code>all_reduce(MIN)</code> 统一 completed_tokens = <b>2</b>。',
+     '◆ <b>Second MIN</b> @ <code>prefetch_completion_sync_groups</code> (set 2, <b>independent communicator</b>): <code>all_reduce(MIN)</code> unifies completed_tokens = <b>2</b>.'],
+    ['③ 各 rank 只把统一的 <b>2 个 page</b> 插入 L2 host tree（<code>_insert_helper_host</code>）→ 3 棵 host tree <strong style="color:var(--green)">再次完全一致</strong>。',
+     '③ Each rank inserts only the unified <b>2 pages</b> into its L2 host tree (<code>_insert_helper_host</code>) → all 3 host trees are <strong style="color:var(--green)">identical again</strong>.'],
+    ['✅ 两套<strong>独立 gloo 组</strong>（组1 命中数、组2 完成数）+ 每 batch 恒 1 个 ack → 各 rank 对 host tree 的<strong>插入/删除完全一致</strong> → <b style="color:var(--green)">host radix tree 始终一致，后台 collective 不会死锁</b>。',
+     '✅ Two <strong>independent gloo group-sets</strong> (set 1 hit count, set 2 completed tokens) + exactly one ack per batch → every rank\u2019s <strong>inserts/deletes are identical</strong> → <b style="color:var(--green)">host radix trees stay consistent and background collectives never deadlock</b>.'],
+    // buttons
+    ['⏸ 暂停','⏸ Pause'],['▶ 播放','▶ Play'],['⟲ 重播','⟲ Replay'],['重置','Reset'],
+    ['▶ 1 套组（死锁）','▶ 1 group set (deadlock)'],['▶ 2 套组（安全）','▶ 2 group sets (safe)'],
+    // tab1 legend + tree title + init caption
+    ['<span class="sw" style="background:var(--amber)"></span>命中数被截断（不一致）','<span class="sw" style="background:var(--amber)"></span>hit count truncated (inconsistent)'],
+    ['<span class="sw" style="background:var(--blue)"></span>TP 组内 MIN 后','<span class="sw" style="background:var(--blue)"></span>after MIN within TP group'],
+    ['<span class="sw" style="background:var(--green)"></span>PP 组内 MIN 后（全局一致）','<span class="sw" style="background:var(--green)"></span>after MIN within PP group (global)'],
+    ['所有 24 个 rank 共享同一棵 radix tree','all 24 ranks share one radix tree'],
+    ['自动播放中…','auto-playing…'],
+    // tab1 captions
+    ['拓扑 <b>PP=3 × TP=8 = 24 个 rank</b>：每个 PP stage 下挂 8 个 TP rank。',
+     'Topology <b>PP=3 × TP=8 = 24 ranks</b>: each PP stage holds 8 TP ranks.'],
+    ['① 各 rank <span class="k">独立</span>向 L3 查询前缀命中。<b style="color:var(--amber)">注意 r10、r15 因 host 内存压力被截断</b>（6 / 7 页）。',
+     '① Each rank <span class="k">independently</span> queries L3 for prefix hits. <b style="color:var(--amber)">Note r10 & r15 are truncated by host-memory pressure</b> (6 / 7 pages).'],
+    ['② 若各 rank 按自己的命中数建 radix tree → 树高不一致 → 后续 PP 集合通信 <b style="color:var(--red)">shape mismatch → crash</b>。',
+     '② If each rank builds its radix tree from its own hit count → tree heights differ → next PP collective <b style="color:var(--red)">shape mismatch → crash</b>.'],
+    ['③ 第一步：在 <span class="k">TP 组（每一行 8 个 rank）</span>内 <code>all_reduce(MIN)</code>。',
+     '③ Step 1: <code>all_reduce(MIN)</code> within each <span class="k">TP group (a row of 8 ranks)</span>.'],
+    ['③ TP 组归约后：<b>每一行变得一致</b>（PP0=8, PP1=6, PP2=8 = 各行最小值）。',
+     '③ After TP reduce: <b>each row is uniform</b> (PP0=8, PP1=6, PP2=8 = per-row min).'],
+    ['④ 第二步：在 <span class="k">PP 组（每一列 3 个 rank）</span>内 <code>all_reduce(MIN)</code> → 收敛到全局最小值。',
+     '④ Step 2: <code>all_reduce(MIN)</code> within each <span class="k">PP group (a column of 3 ranks)</span> → converge to the global minimum.'],
+    ['④ PP 组归约后：<b style="color:var(--green)">全部 24 个 rank 命中数 = 6</b>（最长公共前缀）。',
+     '④ After PP reduce: <b style="color:var(--green)">all 24 ranks hit = 6</b> (longest common prefix).'],
+    ['⑤ 所有 rank 都只 prefetch / 建树到 6 → <span style="color:var(--green)">24 个 rank 的 radix tree 完全一致 ✓</span>',
+     '⑤ Every rank prefetches / builds the tree only up to 6 → <span style="color:var(--green)">all 24 radix trees are identical ✓</span>'],
+    // tab2 legend + note + groups + init + captions + banners
+    ['<span class="sw" style="background:var(--purple)"></span><b>prefetch_thread</b>（独立后台线程）· reduce(storage_hit_count)','<span class="sw" style="background:var(--purple)"></span><b>prefetch_thread</b> (independent background thread) · reduce(storage_hit_count)'],
+    ['<span class="sw" style="background:var(--cyan)"></span><b>prefetch_sync_thread</b>（独立后台线程）· reduce(completed_tokens)','<span class="sw" style="background:var(--cyan)"></span><b>prefetch_sync_thread</b> (independent background thread) · reduce(completed_tokens)'],
+    ['每个 cell = 1 个 rank，内含 2 个独立后台线程（小圆点 ●A ●B）。每一行是一个 <b>TP communicator</b>，每一列是一个 <b>PP communicator</b>。',
+     'Each cell = 1 rank, holding 2 independent background threads (dots ●A ●B). Each row is a <b>TP communicator</b>, each column a <b>PP communicator</b>.'],
+    ['<b>prefetch_hits_sync_groups</b><br>命中页数归约组（含 TP 环 + PP 环）<br><span style="font-size:11px">reduce(storage_hit_count)</span>',
+     '<b>prefetch_hits_sync_groups</b><br>hit-count reduce set (TP rings + PP rings)<br><span style="font-size:11px">reduce(storage_hit_count)</span>'],
+    ['<b>prefetch_completion_sync_groups</b><br>完成 token 归约组（含 TP 环 + PP 环）<br><span style="font-size:11px">reduce(completed_tokens)</span>',
+     '<b>prefetch_completion_sync_groups</b><br>completed-token reduce set (TP rings + PP rings)<br><span style="font-size:11px">reduce(completed_tokens)</span>'],
+    ['选择场景：用 <b>1 套组</b> 会死锁，用 <b>2 套组</b> 则安全。','Pick a scenario: <b>1 group set</b> deadlocks, <b>2 group sets</b> are safe.'],
+    ['只有 <b>1 套组</b>：prefetch_thread(A) 与 prefetch_sync_thread(B) 共用同一个 communicator 集。',
+     'Only <b>1 group set</b>: prefetch_thread(A) and prefetch_sync_thread(B) share the same communicator set.'],
+    ['两个后台线程<b>独立调度、顺序不定</b>：同一个 TP 环里，有的 rank 先发 A，有的先发 B。',
+     'The two background threads are <b>scheduled independently, order unpredictable</b>: within one TP ring some ranks post A first, others post B first.'],
+    ['同一个 communicator 上各 rank 提交的 collective <b style="color:var(--red)">不是同一个</b>（A 与 B 错位）→ rendezvous 永远配不上。',
+     'On the same communicator the collectives submitted by different ranks are <b style="color:var(--red)">not the same</b> (A vs B misaligned) → rendezvous never matches.'],
+    ['只要任一 communicator 上 A/B 交错，该环就死锁 → 全局 PP/TP 通信连环卡住。',
+     'If A/B interleave on any communicator, that ring deadlocks → all PP/TP communication hangs in a chain.'],
+    ['💥 DEADLOCK — 整个 24-rank job 卡死','💥 DEADLOCK — the whole 24-rank job hangs'],
+    ['用 <b>2 套独立组</b>：<b style="color:var(--purple)">A 永远走 prefetch_hits_sync_groups</b>，<b style="color:var(--cyan)">B 永远走 prefetch_completion_sync_groups</b>。',
+     'With <b>2 independent group sets</b>: <b style="color:var(--purple)">A always uses prefetch_hits_sync_groups</b>, <b style="color:var(--cyan)">B always uses prefetch_completion_sync_groups</b>.'],
+    ['第一波：所有 rank 的 <b>prefetch_thread</b> 只在 <code>prefetch_hits_sync_groups</code> 上提交 A → 序列一致。',
+     'Wave 1: every rank\u2019s <b>prefetch_thread</b> posts A only on <code>prefetch_hits_sync_groups</code> → consistent order.'],
+    ['✓ TP 环 + PP 环上 A 全部到齐 → 第一波归约完成。','✓ A arrives on every TP ring + PP ring → wave 1 reduce done.'],
+    ['第二波：所有 rank 的 <b>prefetch_sync_thread</b> 只在 <code>prefetch_completion_sync_groups</code> 上提交 B → 序列一致。',
+     'Wave 2: every rank\u2019s <b>prefetch_sync_thread</b> posts B only on <code>prefetch_completion_sync_groups</code> → consistent order.'],
+    ['每个 communicator 上的 collective 序列在所有 rank <b style="color:var(--green)">完全一致</b>（A→组1、B→组2 不交叉）→ 不会死锁。',
+     'The collective sequence on each communicator is <b style="color:var(--green)">identical across ranks</b> (A→set1, B→set2, never crossing) → no deadlock.'],
+    ['✅ 安全 — 24 个 rank 全部对齐完成','✅ Safe — all 24 ranks aligned and complete'],
+    // tab3 titles / conduits / flow-note / lane hint / captions / formers
+    ['③ 主 PP 流水线执行<strong>时序</strong> <span class="tag gpu">NCCL · GPU</span> <span style="color:var(--muted);font-size:11px;">时序连续、错峰流动，<strong style="color:var(--green)">不被后台 prefetch 同步打断</strong></span>',
+     '③ Main PP pipeline execution <strong>timing</strong> <span class="tag gpu">NCCL · GPU</span> <span style="color:var(--muted);font-size:11px;">continuous, staggered flow, <strong style="color:var(--green)">never interrupted by background prefetch sync</strong></span>'],
+    ['↑ 流水线跑的正是②组好的 <strong>mb0→mb3</strong>，沿 stage0→1→2 错峰对角推进',
+     '↑ The pipeline runs exactly the <strong>mb0→mb3</strong> composed in ②, advancing diagonally stage0→1→2'],
+    ['▲ 组好的 <b>batch &amp; micro-batch 顺序</b> 喂给流水线（内容）','▲ The composed <b>batch &amp; micro-batch order</b> feeds the pipeline (content)'],
+    ['② 三个 PP rank 用<strong>同一个 storage hit</strong> 组 batch（内容必须逐 rank 一致）',
+     '② The three PP ranks compose the batch from <strong>the same storage hit</strong> (content must match per rank)'],
+    ['▲ <code>all_reduce(MIN)</code> 输出统一值 <b>6</b> → 决定 batch size','▲ <code>all_reduce(MIN)</code> outputs the unified value <b>6</b> → determines batch size'],
+    ['① 异步 prefetch 查询 → <code>all_reduce(MIN)</code> <span class="tag cpu">gloo · CPU 后台线程</span>',
+     '① Async prefetch query → <code>all_reduce(MIN)</code> <span class="tag cpu">gloo · CPU background thread</span>'],
+    ['（等待②组好的 batch…）','(waiting for batch from ②…)'],
+    ['① 三个 PP rank 的 prefetch 查询<strong>异步发起</strong>（到达时刻不同）。','① The three PP ranks issue prefetch queries <strong>asynchronously</strong> (different arrival times).'],
+    ['① 先到的 rank 在 <span class="k">gloo CPU 后台线程</span>上<b style="color:var(--amber)">等待对齐</b>（不占 GPU）。',
+     '① Earlier ranks <b style="color:var(--amber)">wait to align</b> on the <span class="k">gloo CPU background thread</span> (no GPU use).'],
+    ['① <b style="color:var(--amber)">pp2 最慢</b>到达 → <code>all_reduce(MIN)</code> 把 8/6/7 <strong style="color:var(--green)">统一成 6</strong>。',
+     '① <b style="color:var(--amber)">pp2 is slowest</b> to arrive → <code>all_reduce(MIN)</code> unifies 8/6/7 <strong style="color:var(--green)">into 6</strong>.'],
+    ['② 统一后的 <b>storage hit = 6</b> 下发给各 rank 调度器 → 决定<strong>已缓存前缀长度 / batch size / micro-batch 顺序</strong>（mb0→mb3）。',
+     '② The unified <b>storage hit = 6</b> goes to each rank\u2019s scheduler → determines <strong>cached prefix length / batch size / micro-batch order</strong> (mb0→mb3).'],
+    ['③ 三个 rank 因拿到<strong style="color:var(--green)">同一个 6</strong> 而组出<strong style="color:var(--green)">完全一致的 batch 与 mb 顺序</strong>，喂给 PP 流水线；执行时序连续不被打断。<br><span style="color:var(--red)">⚠ 若 storage hit 不统一 → batch/mb 顺序逐 rank 发散 → PP 调度错位、卡死。</span>',
+     '③ Because all three ranks get <strong style="color:var(--green)">the same 6</strong>, they compose <strong style="color:var(--green)">identical batches and mb order</strong> fed to the PP pipeline; timing stays continuous.<br><span style="color:var(--red)">⚠ If storage hit weren\u2019t unified → batch/mb order diverges per rank → PP scheduling mismatch & hang.</span>'],
+    // formers
+    ['调度器 · PP rank 0','Scheduler · PP rank 0'],['调度器 · PP rank 1','Scheduler · PP rank 1'],['调度器 · PP rank 2','Scheduler · PP rank 2'],
+    ['已缓存前缀 storage hit = ','cached prefix storage hit = '],
+    [' 页 → 决定 batch 组成',' pages → determines batch'],
+    ['✓ batch & mb 顺序一致','✓ identical batch & mb order'],
+    // mesh labels
+    ['<b>PP stage 0</b><br>(TP 组)','<b>PP stage 0</b><br>(TP group)'],
+    ['<b>PP stage 1</b><br>(TP 组)','<b>PP stage 1</b><br>(TP group)'],
+    ['<b>PP stage 2</b><br>(TP 组)','<b>PP stage 2</b><br>(TP group)'],
+    ['PP 组(列)<br>每列跨 3 个 stage →','PP groups (cols)<br>each spans 3 stages →'],
+    // tab4 flow boxes
+    ['调度器 Scheduler <span class="pin">主线程</span>','Scheduler <span class="pin">main thread</span>'],
+    ['发起 prefetch 请求（writeback / load）','Issues prefetch requests (writeback / load)'],
+    ['▼ <b>prefetch_queue</b>（PrefetchOperation）','▼ <b>prefetch_queue</b> (PrefetchOperation)'],
+    ['① prefetch_thread <span class="pin">storage-hit 线程</span>','① prefetch_thread <span class="pin">storage-hit thread</span>'],
+    ['<code>_storage_hit_query()</code> 查询 L3 命中页数；命中足够→放 prefetch_buffer，不足→prefetch_revoke_queue',
+     '<code>_storage_hit_query()</code> queries L3 hit pages; enough hits → prefetch_buffer, too few → prefetch_revoke_queue'],
+    ['◆ all_reduce(MIN) storage_hit_count <small>@ prefetch_hits_sync_groups（组1，gloo/CPU，含 TP 环 + PP 环）</small>',
+     '◆ all_reduce(MIN) storage_hit_count <small>@ prefetch_hits_sync_groups (set 1, gloo/CPU, TP rings + PP rings)</small>'],
+    ['▼ <b>prefetch_buffer</b>','▼ <b>prefetch_buffer</b>'],
+    ['② prefetch_io_aux_thread <span class="pin">IO 加载线程</span>','② prefetch_io_aux_thread <span class="pin">IO load thread</span>'],
+    ['<code>_page_transfer()</code> 逐 batch 把页从 L3 读入 host；累加 <b>completed_tokens</b>；<b>每个 storage batch 产生 1 个 PrefetchAck</b>（出错也照常产生）',
+     '<code>_page_transfer()</code> loads pages L3→host batch by batch; accumulates <b>completed_tokens</b>; <b>each storage batch emits exactly 1 PrefetchAck</b> (even on error)'],
+    ['▼ <b>prefetch_sync_queue</b>（PrefetchAck）','▼ <b>prefetch_sync_queue</b> (PrefetchAck)'],
+    ['③ prefetch_sync_thread <span class="pin">completion-token 线程</span>','③ prefetch_sync_thread <span class="pin">completion-token thread</span>'],
+    ['对每个 ack 的 <b>completed_tokens</b> 做归约','Reduces <b>completed_tokens</b> of every ack'],
+    ['◆ all_reduce(MIN) completed_tokens <small>@ prefetch_completion_sync_groups（组2，gloo/CPU，含 TP 环 + PP 环）</small>',
+     '◆ all_reduce(MIN) completed_tokens <small>@ prefetch_completion_sync_groups (set 2, gloo/CPU, TP rings + PP rings)</small>'],
+    ['▼ <b>ack_prefetch_queue</b>','▼ <b>ack_prefetch_queue</b>'],
+    ['调度器写入 host radix tree','Scheduler inserts into host radix tree'],
+    ['只插入 <b>completed_tokens</b> 长度的前缀 → <code>_insert_helper_host()</code>','Inserts only the <b>completed_tokens</b>-long prefix → <code>_insert_helper_host()</code>'],
+    ['为什么 MIN(storage_hit) 一致？','Why does MIN(storage_hit) ensure consistency?'],
+    ['各 rank 命中可能不同（host 内存截断、L3 视图差异）。MIN 取<b>最长公共可命中前缀</b> → 所有 rank <b>抓取范围一致</b>，不会各抓不同长度。',
+     'Hits may differ per rank (host-mem truncation, L3 view differences). MIN takes the <b>longest common hittable prefix</b> → every rank <b>fetches the same range</b>, never different lengths.'],
+    ['为什么 MIN(completed_tokens) 一致？','Why does MIN(completed_tokens) ensure consistency?'],
+    ['即便抓取范围一致，实际逐页加载仍可能<b>部分失败</b>（<code>page_get</code> 返回 n≠batch）。MIN 只提交<b>所有 rank 都成功落盘的最长公共前缀</b> → 写入 host tree 的长度逐 rank 相同。',
+     'Even with the same fetch range, per-page loads can <b>partially fail</b> (<code>page_get</code> returns n≠batch). MIN commits only the <b>longest common prefix every rank loaded successfully</b> → identical insert length per rank.'],
+    ['为什么不会 hang？','Why no hang?'],
+    ['每个 storage batch <b>都产生且仅产生一个 PrefetchAck</b>（即使出错也照常产生）→ 每个 rank 参与的 reduce <b>次数严格相等</b>，collective 一一对齐。两个 MIN 一起保证：<b>插入 host tree 的前缀逐 rank 完全相同 → 树一致</b>。',
+     'Each storage batch <b>emits exactly one PrefetchAck</b> (even on error) → every rank joins the <b>same number of reduces</b>, collectives align one-to-one. The two MINs together guarantee: <b>the prefix inserted into the host tree is identical per rank → trees are consistent</b>.'],
+    ['两个 MIN 同步点（组1 命中数、组2 完成数）+ 每 batch 恒定 1 个 ack，共同保证 PP 各 rank 的 host radix tree 严格一致。',
+     'Two MIN sync points (set 1 = hit count, set 2 = completed tokens) + exactly one ack per batch together keep every PP rank\u2019s host radix tree strictly identical.'],
+    // tab4 animated step captions
+    ['沿数据流向下逐步点亮：两个 MIN 同步点 + 每 batch 恒定 1 个 ack。',
+     'Light up step by step along the data flow: two MIN sync points + exactly one ack per batch.'],
+    ['调度器主线程把 prefetch 请求（writeback / load）放入队列，触发后台流水线。',
+     'The scheduler main thread enqueues a prefetch request (writeback / load), kicking off the background pipeline.'],
+    ['<b>PrefetchOperation</b> 入队 <code>prefetch_queue</code>，交给后台线程处理。',
+     'A <b>PrefetchOperation</b> enters <code>prefetch_queue</code>, handed to the background threads.'],
+    ['① <b>prefetch_thread</b> 调 <code>_storage_hit_query()</code> 查询 L3 命中页数（各 rank 可能不同）。',
+     '① <b>prefetch_thread</b> calls <code>_storage_hit_query()</code> to query L3 hit pages (may differ per rank).'],
+    ['◆ <b style="color:var(--amber)">第一个 MIN</b>：在 <code>prefetch_hits_sync_groups</code>（组1）对 storage_hit_count 取最小 → <b>抓取范围逐 rank 一致</b>。',
+     '◆ <b style="color:var(--amber)">First MIN</b>: take the min of storage_hit_count on <code>prefetch_hits_sync_groups</code> (set 1) → <b>the fetch range is identical per rank</b>.'],
+    ['命中足够的请求落入 <code>prefetch_buffer</code>，进入实际 IO 加载。',
+     'Requests with enough hits drop into <code>prefetch_buffer</code> for the actual IO load.'],
+    ['② <b>prefetch_io_aux_thread</b> 用 <code>_page_transfer()</code> 逐 batch 把页 L3→host；<b>每个 batch 恒产生 1 个 PrefetchAck</b>（出错也产生）。',
+     '② <b>prefetch_io_aux_thread</b> uses <code>_page_transfer()</code> to move pages L3→host batch by batch; <b>each batch always emits exactly one PrefetchAck</b> (even on error).'],
+    ['每个 batch 的 <b>PrefetchAck</b> 入队 <code>prefetch_sync_queue</code>。',
+     'Each batch\u2019s <b>PrefetchAck</b> enters <code>prefetch_sync_queue</code>.'],
+    ['③ <b>prefetch_sync_thread</b> 对每个 ack 的 <b>completed_tokens</b> 做归约。',
+     '③ <b>prefetch_sync_thread</b> reduces the <b>completed_tokens</b> of every ack.'],
+    ['◆ <b style="color:var(--green)">第二个 MIN</b>：在 <code>prefetch_completion_sync_groups</code>（组2）对 completed_tokens 取最小 → <b>真正落盘前缀逐 rank 一致</b>。',
+     '◆ <b style="color:var(--green)">Second MIN</b>: take the min of completed_tokens on <code>prefetch_completion_sync_groups</code> (set 2) → <b>the actually-loaded prefix is identical per rank</b>.'],
+    ['统一后的结果入队 <code>ack_prefetch_queue</code> 回到调度器。',
+     'The unified result enters <code>ack_prefetch_queue</code> back to the scheduler.'],
+    ['调度器只插入 <b>completed_tokens</b> 长度的前缀 → <code>_insert_helper_host()</code>。每 batch 恒 1 个 ack，<b>reduce 次数严格相等 → 不会 hang</b>。',
+     'The scheduler inserts only the <b>completed_tokens</b>-long prefix → <code>_insert_helper_host()</code>. One ack per batch means <b>equal reduce counts → no hang</b>.'],
+    ['✅ 闭环：两个 MIN（组1 命中数 + 组2 完成数）+ 每 batch 1 个 ack → <b style="color:var(--green)">PP 各 rank 的 host radix tree 严格一致</b>。',
+     '✅ Closed loop: two MINs (set 1 hit count + set 2 completed tokens) + one ack per batch → <b style="color:var(--green)">every PP rank\u2019s host radix tree is strictly identical</b>.'],
+  ];
+
+  const SEL = ['header h1','header p','.tab','.tree-title','.caption','.banner','.legend .chip',
+    '#scene2 .note','.grp','.t3-title','.clabel','.flow-note','.lane-hint','.ctl',
+    '.pp-label','.pp-foot .lab','.former h5','.former .hitbox .ht1','.former .hitbox .ht2','.former .chk',
+    '.tbox .tname','.tbox .tdesc','.tarrow','.minnode','.whycard h4','.whycard p',
+    '.gpu-badge span','.l3lab','.syncbadge','.consist-flag','#scene6 .note','.barlabel'].join(',');
+
+  const tmp=document.createElement('div');
+  const strip=h=>{ tmp.innerHTML=h; return tmp.textContent.replace(/\s+/g,' ').trim(); };
+  const EN={}, ZH={};
+  PAIRS.forEach(([zh,en])=>{ EN[strip(zh)]=en; ZH[strip(en)]=zh; });
+
+  let LANG='zh';
+  // runtime helper for dynamic strings that contain interpolated values
+  // (cannot be matched by the static dictionary). Reads the live LANG.
+  window.TR=(zh,en)=> LANG==='en' ? en : zh;
+  let mo=null;
+  let suppress=false;   // re-entrancy guard: ignore mutations we cause ourselves
+  function translateEl(el){
+    const k=strip(el.innerHTML);
+    const next = LANG==='en' ? EN[k] : ZH[k];
+    // only write when there is a real change, otherwise we churn the DOM
+    if(next!==undefined && next!==el.innerHTML) el.innerHTML=next;
+  }
+  function translateAll(){
+    suppress=true;
+    document.querySelectorAll(SEL).forEach(translateEl);
+    if(mo) mo.takeRecords();   // drop the records our own writes just generated
+    suppress=false;
+  }
+
+  window.toggleLang=function(){
+    LANG = LANG==='zh' ? 'en' : 'zh';
+    document.getElementById('langBtn').textContent = LANG==='zh' ? 'EN' : '中文';
+    document.documentElement.lang = LANG==='zh' ? 'zh-CN' : 'en';
+    translateAll();
+  };
+
+  // keep dynamic captions translated as JS rewrites them
+  mo=new MutationObserver(muts=>{
+    if(suppress) return;       // skip the mutations our own translations produced
+    suppress=true;
+    muts.forEach(m=>{
+      const tgt = m.target.nodeType===1 ? m.target : m.target.parentElement;
+      if(!tgt) return;
+      const c = tgt.closest && tgt.closest(SEL);
+      if(c) translateEl(c);
+    });
+    mo.takeRecords();
+    suppress=false;
+  });
+  mo.observe(document.body,{subtree:true,childList:true,characterData:true});
+})();
+</script>
+</body>
+</html>
+<style>#langBtn,header,.tabs{display:none!important;}body{background:#0e1117;}.wrap{padding-top:10px;}</style>
+<script>
+(function(){
+  // English-only, single-tab embed: reuse all original JS
+  try{ if(window.toggleLang) toggleLang(); }catch(e){}   // zh -> en
+  var TAB="ackalign";
+  var btn=document.querySelector('.tab[data-tab="'+TAB+'"]');
+  if(btn){ btn.click(); }
+})();
+</script>
diff --git a/public/images/blog/pp_hicache_consistency/hicache_pp_animation_en_consistency.html b/public/images/blog/pp_hicache_consistency/hicache_pp_animation_en_consistency.html
new file mode 100644
index 000000000..f1f5db69e
--- /dev/null
+++ b/public/images/blog/pp_hicache_consistency/hicache_pp_animation_en_consistency.html
@@ -0,0 +1,1559 @@
+<!DOCTYPE html>
+<html lang="zh-CN">
+<head>
+<meta charset="UTF-8" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<title>HiCache × PP=3 · TP=8：树一致性 & 防死锁 动画</title>
+<style>
+  :root{
+    --bg:#0e1117; --panel:#161b22; --panel2:#1c2330; --line:#30363d;
+    --text:#e6edf3; --muted:#8b949e;
+    --blue:#58a6ff; --green:#3fb950; --red:#f85149; --amber:#d29922;
+    --purple:#bc8cff; --cyan:#56d4dd;
+  }
+  *{box-sizing:border-box;}
+  body{
+    margin:0; background:radial-gradient(1200px 600px at 50% -10%, #18202c, var(--bg));
+    color:var(--text); font-family:-apple-system,BlinkMacSystemFont,"Segoe UI","PingFang SC","Microsoft YaHei",sans-serif;
+    line-height:1.5;
+  }
+  #langBtn{ position:fixed; top:14px; right:16px; z-index:50; background:var(--panel2); border:1px solid var(--blue);
+    color:var(--blue); padding:7px 14px; border-radius:999px; cursor:pointer; font-size:13px; font-weight:600; }
+  #langBtn:hover{ background:var(--blue); color:#04101f; }
+  header{ text-align:center; padding:20px 16px 4px; }
+  header h1{ margin:0 0 4px; font-size:21px; }
+  header p{ margin:0; color:var(--muted); font-size:13px; }
+  .tabs{ display:flex; gap:8px; justify-content:center; margin:16px auto 8px; flex-wrap:wrap; }
+  .tab{ background:var(--panel); border:1px solid var(--line); color:var(--text);
+    padding:9px 16px; border-radius:999px; cursor:pointer; font-size:14px; transition:all .15s; }
+  .tab.active{ background:var(--blue); color:#04101f; border-color:var(--blue); font-weight:600; }
+  .wrap{ max-width:1120px; margin:0 auto; padding:0 16px 60px; }
+  .scene{ background:var(--panel); border:1px solid var(--line); border-radius:14px; padding:18px; position:relative; }
+  .hidden{ display:none; }
+  .controls{ display:flex; gap:10px; justify-content:center; align-items:center; margin:14px 0 4px; flex-wrap:wrap; }
+  button.ctl{ background:var(--panel2); border:1px solid var(--line); color:var(--text);
+    padding:8px 16px; border-radius:8px; cursor:pointer; font-size:14px; }
+  button.ctl:hover{ border-color:var(--blue); }
+  button.ctl.primary{ background:var(--green); color:#04140a; border-color:var(--green); font-weight:600; }
+  button.ctl.alt{ background:var(--red); color:#1a0606; border-color:var(--red); font-weight:600; }
+  .caption{ text-align:center; min-height:44px; margin:10px auto 0; max-width:900px; font-size:15px; }
+  .caption .k{ color:var(--cyan); font-weight:600; }
+  code{ background:#0d1117; padding:1px 6px; border-radius:4px; border:1px solid var(--line); color:var(--cyan); font-size:12px; }
+  .legend{ display:flex; gap:18px; justify-content:center; font-size:12px; color:var(--muted); flex-wrap:wrap; margin-bottom:10px; }
+  .chip{ display:inline-flex; align-items:center; gap:6px; }
+  .sw{ width:14px; height:14px; border-radius:4px; display:inline-block; }
+
+  /* ---------- mesh ---------- */
+  .mesh-head{ display:flex; align-items:center; gap:8px; margin-left:96px; margin-bottom:4px; }
+  .tp-hdr{ flex:1; display:flex; gap:6px; }
+  .tp-hdr .th{ flex:1; text-align:center; font-size:11px; color:var(--muted); }
+  .pp-row{ display:flex; align-items:center; gap:8px; margin-bottom:6px; }
+  .pp-label{ width:88px; font-size:12px; color:var(--muted); text-align:right; line-height:1.2; }
+  .pp-label b{ color:var(--text); }
+  .row-cells{ flex:1; display:flex; gap:6px; border:2px solid transparent; border-radius:10px; padding:3px; transition:border-color .3s, box-shadow .3s; }
+  .row-cells.ring-a{ border-color:var(--purple); box-shadow:0 0 12px rgba(188,140,255,.25); }
+  .row-cells.ring-b{ border-color:var(--cyan); box-shadow:0 0 12px rgba(86,212,221,.25); }
+  .row-cells.ring-bad{ border-color:var(--red); box-shadow:0 0 12px rgba(248,81,73,.3); }
+  .cell{
+    flex:1; height:54px; border-radius:8px; background:#0d1117; border:1px solid var(--line);
+    display:flex; flex-direction:column; align-items:center; justify-content:center; gap:1px;
+    transition:background .3s, border-color .3s, transform .15s, box-shadow .15s; position:relative;
+  }
+  .cell .v{ font-size:18px; font-weight:800; color:var(--muted); transition:color .3s; }
+  .cell .v small{ font-size:9px; font-weight:500; }
+  .cell .rk{ font-size:9px; color:#5b6470; }
+  .cell.varied .v{ color:var(--amber); }
+  .cell.tpmin{ background:linear-gradient(180deg,#143055,#102844); border-color:var(--blue); }
+  .cell.tpmin .v{ color:#cfe6ff; }
+  .cell.gmin{ background:linear-gradient(180deg,#0f3a1d,#0e2c18); border-color:var(--green); }
+  .cell.gmin .v{ color:#c4f7d4; }
+  .cell.sweep{ transform:translateY(-3px); box-shadow:0 0 14px rgba(88,166,255,.55); border-color:var(--blue); }
+  .cell.bad{ background:linear-gradient(180deg,#3a1414,#2a1010); border-color:var(--red); }
+  .cell.bad .v{ color:#ffd4d0; }
+  .cell.dim{ opacity:.35; }
+  /* thread dots */
+  .tdots{ display:flex; gap:5px; margin-top:1px; }
+  .td{ width:9px; height:9px; border-radius:50%; border:1px solid var(--line); background:#0d1117; transition:all .25s; }
+  .td.a{ border-color:var(--purple); }
+  .td.b{ border-color:var(--cyan); }
+  .td.a.on{ background:var(--purple); box-shadow:0 0 8px var(--purple); }
+  .td.b.on{ background:var(--cyan); box-shadow:0 0 8px var(--cyan); }
+  .td.done{ background:var(--green); border-color:var(--green); box-shadow:0 0 6px var(--green); }
+  .td.dead{ background:var(--red); border-color:var(--red); box-shadow:0 0 6px var(--red); }
+
+  /* pp-group footer (columns) */
+  .pp-foot{ display:flex; align-items:center; gap:8px; margin-top:6px; }
+  .pp-foot .lab{ width:88px; font-size:11px; color:var(--muted); text-align:right; }
+  .pp-foot .cols{ flex:1; display:flex; gap:6px; padding:0 3px; }
+  .pp-foot .col{ flex:1; height:20px; border-radius:6px; border:1px dashed var(--line); font-size:9px;
+    color:#5b6470; display:flex; align-items:center; justify-content:center; transition:all .3s; }
+  .pp-foot .col.ring-a{ border-color:var(--purple); color:#e3d3ff; }
+  .pp-foot .col.ring-b{ border-color:var(--cyan); color:#cdf6fa; }
+  .pp-foot .col.ring-bad{ border-color:var(--red); color:#ffd4d0; }
+
+  /* shared tree (tab1) */
+  .tree-box{ margin-top:14px; display:flex; flex-direction:column; align-items:center; }
+  .tree-title{ font-size:12px; color:var(--muted); margin-bottom:6px; }
+  .tree{ display:flex; flex-direction:column; align-items:center; gap:4px; min-height:30px; }
+  .tnode{ width:220px; height:20px; border-radius:5px; background:#0d1117; border:1px solid var(--line);
+    display:flex; align-items:center; justify-content:center; font-size:11px; color:var(--muted);
+    opacity:0; transform:translateY(-6px); transition:opacity .25s, transform .25s; }
+  .tnode.show{ opacity:1; transform:none; background:linear-gradient(90deg,#0f3a1d,#196b32); border-color:var(--green); color:#c4f7d4; }
+
+  /* groups panel (tab2) */
+  .groups{ display:flex; gap:16px; justify-content:center; margin-top:12px; flex-wrap:wrap; }
+  .grp{ border:1px dashed var(--line); border-radius:10px; padding:8px 14px; font-size:12px; color:var(--muted);
+    min-width:230px; text-align:center; transition:all .3s; }
+  .grp b{ color:var(--text); }
+  .grp.g1.hot{ border-color:var(--purple); color:#e3d3ff; box-shadow:0 0 14px rgba(188,140,255,.25); }
+  .grp.g2.hot{ border-color:var(--cyan); color:#cdf6fa; box-shadow:0 0 14px rgba(86,212,221,.25); }
+  .banner{ text-align:center; font-weight:800; font-size:18px; min-height:24px; margin-top:8px; }
+  .banner.ok{ color:var(--green); } .banner.bad{ color:var(--red); }
+  .note{ font-size:12px; color:var(--muted); text-align:center; margin-top:4px; }
+
+  /* ---------- tab3: async skew x MIN ---------- */
+  .t3-section{ background:var(--panel2); border:1px solid var(--line); border-radius:12px; padding:10px 12px; margin-bottom:14px; }
+  .t3-title{ font-size:13px; margin-bottom:8px; display:flex; align-items:center; gap:8px; }
+  .t3-title .tag{ font-size:10px; padding:2px 8px; border-radius:999px; border:1px solid var(--line); color:var(--muted); }
+  .t3-title .tag.gpu{ border-color:var(--blue); color:#cfe6ff; }
+  .t3-title .tag.cpu{ border-color:var(--amber); color:#ffe2ab; }
+  /* top pipeline */
+  .pipe{ position:relative; }
+  .lane{ position:relative; height:34px; margin:6px 0; border-radius:8px; background:#0d1117;
+    border:1px solid var(--line); overflow:hidden; }
+  .lane .lname{ position:absolute; left:8px; top:50%; transform:translateY(-50%); font-size:11px; color:var(--muted); z-index:3; }
+  .mb{ position:absolute; top:6px; height:22px; width:52px; border-radius:6px; z-index:2;
+    display:flex; align-items:center; justify-content:center; font-size:10px; font-weight:700;
+    opacity:0; color:#c4f7d4; border:1px solid var(--green);
+    background:linear-gradient(180deg,#0f3a1d,#15311f); transition:opacity .18s; }
+  .lane-hint{ position:absolute; right:10px; top:50%; transform:translateY(-50%); font-size:10px; color:#46505e; z-index:1; }
+  .flow-note{ font-size:11px; color:var(--green); text-align:right; margin-top:2px; }
+
+  /* bottom sync area */
+  .sync-area{ position:relative; height:170px; border-radius:8px; background:#0d1117; border:1px solid var(--line); overflow:hidden; }
+  .barrier{ position:absolute; top:6px; bottom:6px; width:0; border-left:2px dashed var(--amber); z-index:2; }
+  .barrier .blabel{ position:absolute; top:-2px; left:8px; font-size:11px; color:var(--amber); white-space:nowrap; }
+  .barrier.fire{ border-left-color:var(--green); box-shadow:-2px 0 18px rgba(63,185,80,.5); }
+  .slane{ position:absolute; left:0; right:0; height:1px; border-top:1px dashed #1f2733; }
+  .slabel{ position:absolute; left:8px; font-size:11px; color:var(--muted); z-index:3; transform:translateY(-50%); }
+  .pkt{ position:absolute; height:30px; width:108px; border-radius:8px; z-index:3; transform:translateY(-50%);
+    display:flex; align-items:center; justify-content:center; gap:6px; font-size:11px; font-weight:700;
+    border:1px solid var(--line); background:#11161f; transition:background .25s, border-color .25s, box-shadow .25s; }
+  .pkt .hv{ font-size:14px; }
+  .pkt.travel{ border-color:var(--blue); color:#cfe6ff; }
+  .pkt.wait{ border-color:var(--amber); color:#ffe2ab; background:#1d1808; animation:pulse 1s infinite; }
+  .pkt.unified{ border-color:var(--green); color:#c4f7d4; background:linear-gradient(180deg,#0f3a1d,#0e2c18); box-shadow:0 0 12px rgba(63,185,80,.35); }
+  .clock{ position:absolute; right:10px; top:8px; font-size:11px; color:var(--muted); z-index:4; }
+  /* causal arrows + batch formers */
+  .t3-conduit{ position:relative; height:38px; margin:-2px 0 8px; display:flex; align-items:center; justify-content:center; }
+  .t3-conduit .clabel{ font-size:12px; color:var(--muted); transition:color .3s; z-index:2; background:var(--panel); padding:0 8px; }
+  .t3-conduit.hot .clabel{ color:var(--green); }
+  .t3-conduit::before{ content:""; position:absolute; left:50%; top:4px; bottom:4px; width:2px; transform:translateX(-50%);
+    background:repeating-linear-gradient(to top, #2b3340 0 6px, transparent 6px 12px); }
+  .t3-conduit.hot::before{ background:repeating-linear-gradient(to top, var(--green) 0 6px, transparent 6px 12px); opacity:.5; }
+  .t3-conduit .spark{ position:absolute; left:50%; bottom:3px; width:11px; height:11px; border-radius:50%;
+    background:var(--green); opacity:0; box-shadow:0 0 12px var(--green); transform:translateX(-50%); z-index:3; }
+  .t3-conduit.hot .spark{ animation:rise .85s linear infinite; }
+  .t3-conduit.hot .spark.s2{ animation-delay:.42s; }
+  @keyframes rise{ 0%{opacity:0; transform:translate(-50%,0);} 15%{opacity:1;} 100%{opacity:0; transform:translate(-50%,-34px);} }
+  .pipe.fed .lane{ border-color:var(--green); box-shadow:0 0 10px rgba(63,185,80,.25); }
+  .formers{ display:flex; gap:14px; justify-content:center; flex-wrap:wrap; }
+  .former{ flex:1; min-width:240px; max-width:320px; background:#0d1117; border:1px solid var(--line);
+    border-radius:10px; padding:10px 12px; transition:border-color .3s, box-shadow .3s; }
+  .former h5{ margin:0 0 6px; font-size:13px; }
+  .former .hitbox{ font-size:12px; color:var(--muted); margin-bottom:8px; }
+  .former .hitbox b{ color:var(--amber); font-size:14px; }
+  .former.ready{ border-color:var(--green); box-shadow:0 0 12px rgba(63,185,80,.2); }
+  .former.ready .hitbox b{ color:var(--green); }
+  .mbrow{ display:flex; gap:6px; }
+  .mbchip{ flex:1; height:24px; border-radius:6px; background:#11161f; border:1px solid var(--line);
+    display:flex; align-items:center; justify-content:center; font-size:11px; color:var(--muted); opacity:.3; transition:all .25s; }
+  .mbchip.on{ opacity:1; background:linear-gradient(180deg,#143055,#102844); border-color:var(--blue); color:#cfe6ff; }
+  .mbchip.fixed{ opacity:1; background:linear-gradient(180deg,#0f3a1d,#0e2c18); border-color:var(--green); color:#c4f7d4; }
+  .former .chk{ font-size:12px; color:var(--green); margin-top:6px; min-height:16px; }
+
+  /* ---------- tab4: thread relationships ---------- */
+  .t4wrap{ display:flex; gap:18px; flex-wrap:wrap; }
+  .t4flow{ flex:2; min-width:340px; display:flex; flex-direction:column; align-items:stretch; gap:0; }
+  .t4why{ flex:1; min-width:280px; display:flex; flex-direction:column; gap:10px; }
+  .tbox{ border:1px solid var(--line); border-radius:10px; padding:10px 12px; background:#0d1117; position:relative; }
+  .tbox .tname{ font-size:14px; font-weight:700; display:flex; align-items:center; gap:8px; }
+  .tbox .tdesc{ font-size:11.5px; color:var(--muted); margin-top:3px; }
+  .tbox .pin{ font-size:10px; padding:1px 7px; border-radius:999px; border:1px solid var(--line); color:var(--muted); }
+  .tbox.thread-hit{ border-left:4px solid var(--purple); }
+  .tbox.thread-io{ border-left:4px solid var(--blue); }
+  .tbox.thread-sync{ border-left:4px solid var(--cyan); }
+  .tbox.sched{ border-left:4px solid var(--muted); background:#11161f; }
+  .tarrow{ text-align:center; color:var(--muted); font-size:11px; padding:5px 0; position:relative; }
+  .tarrow b{ color:var(--text); }
+  .minnode{ align-self:center; margin:6px 0; border:1.5px solid var(--amber); border-radius:999px;
+    padding:7px 16px; font-size:12px; color:#ffe2ab; background:#1d1808; font-weight:600; }
+  .minnode.g2{ border-color:var(--green); color:#c4f7d4; background:#0f2a18; }
+  .minnode small{ display:block; font-size:10px; color:var(--muted); font-weight:400; }
+  .whycard{ border:1px solid var(--line); border-radius:10px; padding:11px 13px; background:var(--panel2); }
+  .whycard h4{ margin:0 0 6px; font-size:13px; }
+  .whycard p{ margin:0; font-size:12px; color:var(--muted); }
+  .whycard.a{ border-left:4px solid var(--purple); }
+  .whycard.b{ border-left:4px solid var(--cyan); }
+  .whycard.c{ border-left:4px solid var(--green); }
+  .whycard code{ font-size:11px; }
+
+  /* ---------- tab4 animation states ---------- */
+  #scene4 .tbox,#scene4 .tarrow,#scene4 .minnode,#scene4 .whycard{ transition:all .3s; }
+  #scene4 .dimmed{ opacity:.3; }
+  .tbox.lit{ box-shadow:0 0 18px rgba(88,166,255,.45); transform:translateX(5px); }
+  .tbox.thread-hit.lit{ box-shadow:0 0 18px rgba(188,140,255,.55); }
+  .tbox.thread-io.lit{ box-shadow:0 0 18px rgba(88,166,255,.55); }
+  .tbox.thread-sync.lit{ box-shadow:0 0 18px rgba(86,212,221,.55); }
+  .tbox.sched.lit{ box-shadow:0 0 18px rgba(63,185,80,.4); }
+  .tarrow.lit{ color:var(--green); font-weight:700; }
+  .tarrow.lit b{ color:var(--green); }
+  .tarrow.lit::after{ content:" ●"; color:var(--green); animation:t4blink .7s infinite; }
+  @keyframes t4blink{ 0%,100%{opacity:.2;} 50%{opacity:1;} }
+  .minnode.lit{ box-shadow:0 0 22px rgba(210,153,34,.65); transform:scale(1.06); }
+  .minnode.g2.lit{ box-shadow:0 0 22px rgba(63,185,80,.65); }
+  .whycard.lit{ box-shadow:0 0 18px rgba(63,185,80,.35); transform:translateY(-3px); border-left-width:6px; }
+
+  /* ---------- tab5: two-request full lifecycle story ---------- */
+  #scene5 .story-top{ display:flex; gap:14px; align-items:stretch; margin-bottom:14px; }
+  .gpu-badge{ width:120px; border:1px solid var(--line); border-radius:12px; background:#0d1117;
+    display:flex; flex-direction:column; align-items:center; justify-content:center; gap:2px;
+    font-size:12px; color:var(--muted); transition:all .3s; }
+  .gpu-badge .ic{ font-size:22px; }
+  .gpu-badge.busy{ border-color:var(--blue); color:#cfe6ff; box-shadow:0 0 18px rgba(88,166,255,.45);
+    background:linear-gradient(180deg,#143055,#102844); animation:gpupulse 1s infinite; }
+  @keyframes gpupulse{ 0%,100%{box-shadow:0 0 12px rgba(88,166,255,.3);} 50%{box-shadow:0 0 22px rgba(88,166,255,.6);} }
+  .l3box{ flex:1; border:1px solid var(--line); border-radius:12px; background:#0d1117; padding:8px 12px; transition:all .3s; }
+  .l3box.hot{ border-color:var(--green); box-shadow:0 0 14px rgba(63,185,80,.25); }
+  .l3title{ font-size:12px; color:var(--muted); margin-bottom:6px; display:flex; justify-content:space-between; align-items:center; }
+  .l3title .badge{ font-size:10px; padding:1px 8px; border-radius:999px; border:1px solid var(--line); color:var(--muted); min-width:46px; text-align:center; }
+  .l3title .badge.miss{ border-color:var(--red); color:#ffd4d0; }
+  .l3title .badge.hit{ border-color:var(--green); color:#c4f7d4; }
+  .pagerow{ display:flex; gap:6px; flex-wrap:wrap; min-height:28px; }
+  .pg{ width:38px; height:26px; border-radius:6px; border:1px solid var(--line); background:#11161f;
+    display:flex; align-items:center; justify-content:center; font-size:11px; color:#5b6470;
+    transition:all .3s; opacity:0; transform:scale(.6); }
+  .pg.show{ opacity:1; transform:none; }
+  .pg.l3{ border-color:var(--green); color:#c4f7d4; background:linear-gradient(180deg,#0f3a1d,#0e2c18); }
+
+  #scene5 .ranks{ display:flex; flex-direction:column; gap:10px; }
+  .ranklane{ border:1px solid var(--line); border-radius:12px; padding:8px 12px; background:var(--panel2); transition:all .3s; }
+  .ranklane.active{ border-color:var(--blue); box-shadow:0 0 12px rgba(88,166,255,.22); }
+  .rankhdr{ display:flex; align-items:center; gap:10px; margin-bottom:6px; font-size:12px; }
+  .rankhdr .rname{ font-weight:700; color:var(--text); }
+  .rankhdr .tps{ display:flex; gap:4px; }
+  .rankhdr .tp{ width:8px; height:8px; border-radius:50%; background:#11161f; border:1px solid var(--line); transition:all .25s; }
+  .rankhdr .tp.on{ background:var(--blue); border-color:var(--blue); box-shadow:0 0 6px var(--blue); }
+  .rankhdr .rstat{ margin-left:auto; font-size:11px; padding:1px 9px; border-radius:999px; border:1px solid var(--line); color:var(--muted); }
+  .rankhdr .rstat.miss{ border-color:var(--red); color:#ffd4d0; }
+  .rankhdr .rstat.hit{ border-color:var(--green); color:#c4f7d4; }
+  .rankhdr .rstat.warn{ border-color:var(--amber); color:#ffe2ab; }
+  .htree{ display:flex; align-items:center; gap:8px; min-height:28px; }
+  .htree .root{ font-size:10px; color:var(--muted); padding:2px 7px; border:1px dashed var(--line); border-radius:6px; }
+  .htnode{ width:40px; height:26px; border-radius:6px; border:1px solid var(--line); background:#11161f;
+    display:flex; align-items:center; justify-content:center; font-size:11px; color:#5b6470; position:relative;
+    transition:all .35s; opacity:0; transform:translateY(-6px) scale(.7); }
+  .htnode::before{ content:""; position:absolute; left:-8px; top:50%; width:8px; height:1px; background:var(--line); }
+  .htnode:first-of-type::before{ display:none; }
+  .htnode.show{ opacity:1; transform:none; }
+  .htnode.committed{ border-color:var(--green); color:#c4f7d4; background:linear-gradient(180deg,#0f3a1d,#0e2c18); }
+  .htnode.inserting{ border-color:var(--blue); color:#cfe6ff; background:linear-gradient(180deg,#143055,#102844); }
+  .htnode.matched{ border-color:var(--cyan); color:#cdf6fa; box-shadow:0 0 8px rgba(86,212,221,.4); }
+  .htnode.warn{ border-color:var(--amber); color:#ffe2ab; background:#1d1808; }
+  .htnode.evict{ border-color:var(--red); color:#ffd4d0; background:linear-gradient(180deg,#3a1414,#2a1010); }
+  .story-sync{ display:flex; gap:14px; justify-content:center; margin:14px 0 6px; flex-wrap:wrap; }
+  .syncbadge{ font-size:12px; padding:6px 14px; border-radius:999px; border:1.5px solid var(--line); color:var(--muted); transition:all .3s; }
+  .syncbadge.g1.fire{ border-color:var(--amber); color:#ffe2ab; background:#1d1808; box-shadow:0 0 16px rgba(210,153,34,.45); transform:scale(1.04); }
+  .syncbadge.g2.fire{ border-color:var(--green); color:#c4f7d4; background:#0f2a18; box-shadow:0 0 16px rgba(63,185,80,.45); transform:scale(1.04); }
+  .consist-flag{ text-align:center; font-weight:700; font-size:14px; min-height:20px; margin-top:4px; }
+  .consist-flag.ok{ color:var(--green); } .consist-flag.bad{ color:var(--red); }
+
+  /* ---------- tab6: PrefetchAck alignment & anti-hang ---------- */
+  #scene6 .ackmesh{ display:flex; flex-direction:column; gap:8px; margin-bottom:6px; }
+  .ackrow{ display:flex; align-items:center; gap:10px; border:1px solid var(--line); border-radius:10px; padding:6px 10px; background:var(--panel2); transition:all .3s; }
+  .ackrow.blocked{ border-color:var(--red); box-shadow:0 0 12px rgba(248,81,73,.3); }
+  .acklabel{ width:190px; font-size:12px; color:var(--muted); }
+  .acklabel b{ color:var(--text); }
+  .ackslots{ flex:1; display:flex; gap:8px; }
+  .ackchip{ flex:1; height:34px; border-radius:8px; border:1px solid var(--line); background:#0d1117;
+    display:flex; align-items:center; justify-content:center; font-size:11px; color:#5b6470; gap:5px;
+    transition:all .3s; position:relative; }
+  .ackchip .err{ font-size:9px; color:var(--red); }
+  .ackchip.pending{ opacity:.4; }
+  .ackchip.emit{ border-color:var(--blue); color:#cfe6ff; background:linear-gradient(180deg,#143055,#102844); box-shadow:0 0 10px rgba(88,166,255,.4); }
+  .ackchip.passed{ border-color:var(--green); color:#c4f7d4; background:linear-gradient(180deg,#0f3a1d,#0e2c18); }
+  .ackchip.wait{ border-color:var(--amber); color:#ffe2ab; background:#1d1808; animation:pulse 1s infinite; }
+  .ackchip.missing{ border-color:var(--red); border-style:dashed; color:#ffd4d0; background:#2a1010; }
+  .barriers{ border:1px dashed var(--line); border-radius:10px; padding:8px 10px; margin-top:4px; }
+  .barlabel{ font-size:12px; color:var(--muted); margin-bottom:6px; text-align:center; }
+  .barrow{ display:flex; gap:8px; align-items:stretch; }
+  .barrow .barspacer{ width:190px; }
+  .barcols{ flex:1; display:flex; gap:8px; }
+  .bar{ flex:1; border:1.5px solid var(--line); border-radius:8px; padding:6px 4px; text-align:center;
+    font-size:11px; color:var(--muted); transition:all .3s; }
+  .bar .bcount{ display:block; font-size:13px; font-weight:800; margin-top:2px; color:#5b6470; }
+  .bar.waiting{ border-color:var(--amber); color:#ffe2ab; background:#1d1808; animation:pulse 1s infinite; }
+  .bar.waiting .bcount{ color:#ffe2ab; }
+  .bar.fired{ border-color:var(--green); color:#c4f7d4; background:#0f2a18; }
+  .bar.fired .bcount{ color:#c4f7d4; }
+  .bar.dead{ border-color:var(--red); color:#ffd4d0; background:#2a1010; box-shadow:0 0 12px rgba(248,81,73,.4); }
+  .bar.dead .bcount{ color:#ffd4d0; }
+</style>
+</head>
+<body>
+<button id="langBtn" onclick="toggleLang()">EN</button>
+<header>
+  <h1>HiCache × Pipeline Parallel：树一致性 & 防死锁</h1>
+  <p>拓扑 <b>PP=3 × TP=8 = 24 ranks</b> · 行=TP 组、列=PP 组 · MIN all-reduce 保证 radix tree 一致 · 2 套 gloo 组避免后台 collective 死锁</p>
+</header>
+
+<div class="tabs">
+  <button class="tab active" data-tab="story">① 两请求全流程（L3 命中/未命中 · host tree 一致）</button>
+  <button class="tab" data-tab="consistency">② 树一致性（自动播放）</button>
+  <button class="tab" data-tab="deadlock">③ 为什么 2 个组不死锁</button>
+  <button class="tab" data-tab="skew">④ 异步时间差 × MIN 统一步调</button>
+  <button class="tab" data-tab="threads">⑤ 线程关系 & 树一致性</button>
+  <button class="tab" data-tab="ackalign">⑥ PrefetchAck 对齐 & 防 hang</button>
+</div>
+
+<div class="wrap">
+  <!-- ============ TAB 5 (story, shown first) ============ -->
+  <div class="scene" id="scene5">
+    <div class="legend">
+      <span class="chip"><span class="sw" style="background:var(--blue)"></span>GPU 计算 / 插入中</span>
+      <span class="chip"><span class="sw" style="background:var(--cyan)"></span>match 命中前缀</span>
+      <span class="chip"><span class="sw" style="background:var(--amber)"></span>各 rank 不一致（待 MIN 统一）</span>
+      <span class="chip"><span class="sw" style="background:var(--green)"></span>已提交 / 一致</span>
+      <span class="chip"><span class="sw" style="background:var(--red)"></span>未命中 / 淘汰删除</span>
+    </div>
+    <div class="story-top">
+      <div class="gpu-badge" id="gpuBadge"><span class="ic">▣</span><span>GPU 计算</span></div>
+      <div class="l3box" id="l3box">
+        <div class="l3title"><span class="l3lab"><b>L3 持久化存储</b>（storage backend，3 个 rank 共享视图）</span><span class="badge" id="l3badge"></span></div>
+        <div class="pagerow" id="l3pages"></div>
+      </div>
+    </div>
+    <div class="ranks" id="ranks"></div>
+    <div class="story-sync">
+      <div class="syncbadge g1" id="s5sync1">◆ MIN 组1 · prefetch_hits_sync_groups · storage_hit_count</div>
+      <div class="syncbadge g2" id="s5sync2">◆ MIN 组2 · prefetch_completion_sync_groups · completed_tokens</div>
+    </div>
+    <div class="consist-flag" id="s5flag"></div>
+    <div class="caption" id="cap5">自动播放中…</div>
+  </div>
+  <div class="controls" id="ctl5">
+    <button class="ctl primary" id="play5">⏸ 暂停</button>
+    <button class="ctl" id="replay5">⟲ 重播</button>
+  </div>
+
+  <!-- ============ TAB 1 ============ -->
+  <div class="scene hidden" id="scene1">
+    <div class="legend">
+      <span class="chip"><span class="sw" style="background:var(--amber)"></span>命中数被截断（不一致）</span>
+      <span class="chip"><span class="sw" style="background:var(--blue)"></span>TP 组内 MIN 后</span>
+      <span class="chip"><span class="sw" style="background:var(--green)"></span>PP 组内 MIN 后（全局一致）</span>
+    </div>
+    <div id="mesh1"></div>
+    <div class="tree-box">
+      <div class="tree-title" id="treeTitle">所有 24 个 rank 共享同一棵 radix tree</div>
+      <div class="tree" id="sharedTree"></div>
+    </div>
+    <div class="caption" id="cap1">自动播放中…</div>
+  </div>
+  <div class="controls">
+    <button class="ctl primary" id="play1">⏸ 暂停</button>
+    <button class="ctl" id="replay1">⟲ 重播</button>
+  </div>
+
+  <!-- ============ TAB 2 ============ -->
+  <div class="scene hidden" id="scene2">
+    <div class="legend">
+      <span class="chip"><span class="sw" style="background:var(--purple)"></span><b>prefetch_thread</b>（独立后台线程）· reduce(storage_hit_count)</span>
+      <span class="chip"><span class="sw" style="background:var(--cyan)"></span><b>prefetch_sync_thread</b>（独立后台线程）· reduce(completed_tokens)</span>
+    </div>
+    <div class="note" style="margin-bottom:8px;">每个 cell = 1 个 rank，内含 2 个独立后台线程（小圆点 ●A ●B）。每一行是一个 <b>TP communicator</b>，每一列是一个 <b>PP communicator</b>。</div>
+    <div id="mesh2"></div>
+    <div class="groups">
+      <div class="grp g1" id="g1"><b>prefetch_hits_sync_groups</b><br>命中页数归约组（含 TP 环 + PP 环）<br><span style="font-size:11px">reduce(storage_hit_count)</span></div>
+      <div class="grp g2" id="g2"><b>prefetch_completion_sync_groups</b><br>完成 token 归约组（含 TP 环 + PP 环）<br><span style="font-size:11px">reduce(completed_tokens)</span></div>
+    </div>
+    <div class="banner" id="banner2"></div>
+    <div class="caption" id="cap2">选择场景：用 <b>1 套组</b> 会死锁，用 <b>2 套组</b> 则安全。</div>
+  </div>
+  <div class="controls" id="ctl2">
+    <button class="ctl alt" id="play1grp">▶ 1 套组（死锁）</button>
+    <button class="ctl primary" id="play2grp">▶ 2 套组（安全）</button>
+    <button class="ctl" id="reset2">重置</button>
+  </div>
+
+  <!-- ============ TAB 3 ============ -->
+  <div class="scene hidden" id="scene3">
+    <!-- layer 3: pipeline timing (top) -->
+    <div class="t3-section">
+      <div class="t3-title">③ 主 PP 流水线执行<strong>时序</strong> <span class="tag gpu">NCCL · GPU</span>
+        <span style="color:var(--muted);font-size:11px;">时序连续、错峰流动，<strong style="color:var(--green)">不被后台 prefetch 同步打断</strong></span>
+      </div>
+      <div class="pipe" id="pipe">
+        <div class="lane s0"><span class="lname">PP stage 0</span><span class="lane-hint" id="hint0"></span></div>
+        <div class="lane s1"><span class="lname">PP stage 1</span><span class="lane-hint" id="hint1"></span></div>
+        <div class="lane s2"><span class="lname">PP stage 2</span><span class="lane-hint" id="hint2"></span></div>
+      </div>
+      <div class="flow-note">↑ 流水线跑的正是②组好的 <strong>mb0→mb3</strong>，沿 stage0→1→2 错峰对角推进</div>
+    </div>
+
+    <div class="t3-conduit" id="arrow1">
+      <span class="clabel">▲ 组好的 <b>batch &amp; micro-batch 顺序</b> 喂给流水线（内容）</span>
+      <span class="spark"></span><span class="spark s2"></span>
+    </div>
+
+    <!-- layer 2: batch former (middle) -->
+    <div class="t3-section">
+      <div class="t3-title">② 三个 PP rank 用<strong>同一个 storage hit</strong> 组 batch（内容必须逐 rank 一致）</div>
+      <div class="formers" id="formers"></div>
+    </div>
+
+    <div class="t3-conduit" id="arrow2">
+      <span class="clabel">▲ <code>all_reduce(MIN)</code> 输出统一值 <b>6</b> → 决定 batch size</span>
+      <span class="spark"></span><span class="spark s2"></span>
+    </div>
+
+    <!-- layer 1: async prefetch + MIN (bottom = causal source) -->
+    <div class="t3-section">
+      <div class="t3-title">① 异步 prefetch 查询 → <code>all_reduce(MIN)</code> <span class="tag cpu">gloo · CPU 后台线程</span></div>
+      <div class="sync-area" id="syncArea">
+        <div class="clock" id="t3clock">t = 0.0s</div>
+        <div class="barrier" id="barrier" style="left:62%"><span class="blabel">all_reduce(MIN)</span></div>
+        <div class="slabel" id="sl0">rank pp0</div><div class="pkt travel" id="pkt0"><span>pp0 storage hit</span><span class="hv">8</span></div>
+        <div class="slabel" id="sl1">rank pp1</div><div class="pkt travel" id="pkt1"><span>pp1 storage hit</span><span class="hv">6</span></div>
+        <div class="slabel" id="sl2">rank pp2</div><div class="pkt travel" id="pkt2"><span>pp2 storage hit</span><span class="hv">7</span></div>
+      </div>
+    </div>
+    <div class="caption" id="cap3">自动播放中…</div>
+  </div>
+  <div class="controls" id="ctl3">
+    <button class="ctl primary" id="play3">⏸ 暂停</button>
+    <button class="ctl" id="replay3">⟲ 重播</button>
+  </div>
+
+  <!-- ============ TAB 4 ============ -->
+  <div class="scene hidden" id="scene4">
+    <div class="t4wrap">
+      <!-- left: thread data-flow -->
+      <div class="t4flow">
+        <div class="tbox sched" id="t4b0">
+          <div class="tname">调度器 Scheduler <span class="pin">主线程</span></div>
+          <div class="tdesc">发起 prefetch 请求（writeback / load）</div>
+        </div>
+        <div class="tarrow" id="t4a0">▼ <b>prefetch_queue</b>（PrefetchOperation）</div>
+
+        <div class="tbox thread-hit" id="t4b1">
+          <div class="tname">① prefetch_thread <span class="pin">storage-hit 线程</span></div>
+          <div class="tdesc"><code>_storage_hit_query()</code> 查询 L3 命中页数；命中足够→放 prefetch_buffer，不足→prefetch_revoke_queue</div>
+        </div>
+        <div class="minnode" id="t4m1">◆ all_reduce(MIN) storage_hit_count
+          <small>@ prefetch_hits_sync_groups（组1，gloo/CPU，含 TP 环 + PP 环）</small></div>
+        <div class="tarrow" id="t4a1">▼ <b>prefetch_buffer</b></div>
+
+        <div class="tbox thread-io" id="t4b2">
+          <div class="tname">② prefetch_io_aux_thread <span class="pin">IO 加载线程</span></div>
+          <div class="tdesc"><code>_page_transfer()</code> 逐 batch 把页从 L3 读入 host；累加 <b>completed_tokens</b>；<b>每个 storage batch 产生 1 个 PrefetchAck</b>（出错也照常产生）</div>
+        </div>
+        <div class="tarrow" id="t4a2">▼ <b>prefetch_sync_queue</b>（PrefetchAck）</div>
+
+        <div class="tbox thread-sync" id="t4b3">
+          <div class="tname">③ prefetch_sync_thread <span class="pin">completion-token 线程</span></div>
+          <div class="tdesc">对每个 ack 的 <b>completed_tokens</b> 做归约</div>
+        </div>
+        <div class="minnode g2" id="t4m2">◆ all_reduce(MIN) completed_tokens
+          <small>@ prefetch_completion_sync_groups（组2，gloo/CPU，含 TP 环 + PP 环）</small></div>
+        <div class="tarrow" id="t4a3">▼ <b>ack_prefetch_queue</b></div>
+
+        <div class="tbox sched" id="t4b4">
+          <div class="tname">调度器写入 host radix tree</div>
+          <div class="tdesc">只插入 <b>completed_tokens</b> 长度的前缀 → <code>_insert_helper_host()</code></div>
+        </div>
+      </div>
+
+      <!-- right: why consistent -->
+      <div class="t4why">
+        <div class="whycard a" id="t4wa">
+          <h4>为什么 MIN(storage_hit) 一致？</h4>
+          <p>各 rank 命中可能不同（host 内存截断、L3 视图差异）。MIN 取<b>最长公共可命中前缀</b> → 所有 rank <b>抓取范围一致</b>，不会各抓不同长度。</p>
+        </div>
+        <div class="whycard b" id="t4wb">
+          <h4>为什么 MIN(completed_tokens) 一致？</h4>
+          <p>即便抓取范围一致，实际逐页加载仍可能<b>部分失败</b>（<code>page_get</code> 返回 n≠batch）。MIN 只提交<b>所有 rank 都成功落盘的最长公共前缀</b> → 写入 host tree 的长度逐 rank 相同。</p>
+        </div>
+        <div class="whycard c" id="t4wc">
+          <h4>为什么不会 hang？</h4>
+          <p>每个 storage batch <b>都产生且仅产生一个 PrefetchAck</b>（即使出错也照常产生）→ 每个 rank 参与的 reduce <b>次数严格相等</b>，collective 一一对齐。两个 MIN 一起保证：<b>插入 host tree 的前缀逐 rank 完全相同 → 树一致</b>。</p>
+        </div>
+      </div>
+    </div>
+    <div class="caption" id="cap4">两个 MIN 同步点（组1 命中数、组2 完成数）+ 每 batch 恒定 1 个 ack，共同保证 PP 各 rank 的 host radix tree 严格一致。</div>
+  </div>
+  <div class="controls" id="ctl4">
+    <button class="ctl primary" id="play4">⏸ 暂停</button>
+    <button class="ctl" id="replay4">⟲ 重播</button>
+  </div>
+
+  <!-- ============ TAB 6 : PrefetchAck alignment & anti-hang ============ -->
+  <div class="scene hidden" id="scene6">
+    <div class="note" style="margin-bottom:10px;">每个 <b>storage batch</b> 在 <code>_page_transfer</code> 里恒产 <b>1 个 PrefetchAck</b>；<code>prefetch_sync_thread</code> 对<strong>每个 ack</strong> 在组2 做一次 <code>all_reduce(MIN)</code>。所以 <b>ack 数 = batch 数 = 组2 collective 次数</b>，必须逐 rank 相等。</div>
+    <div class="legend">
+      <span class="chip"><span class="sw" style="background:var(--blue)"></span>ack 已产出（参与本轮 reduce）</span>
+      <span class="chip"><span class="sw" style="background:var(--green)"></span>barrier 凑齐 3/3 → 通过</span>
+      <span class="chip"><span class="sw" style="background:var(--amber)"></span>已到达，等待缺席方</span>
+      <span class="chip"><span class="sw" style="background:var(--red)"></span>缺失 ack → 永远等不到</span>
+    </div>
+    <div class="ackmesh" id="ackmesh"></div>
+    <div class="barriers">
+      <div class="barlabel">◆ <code>all_reduce(MIN)</code> @ 组2（prefetch_completion_sync_groups）· 每个 ack 一次 barrier</div>
+      <div class="barrow"><div class="barspacer"></div><div class="barcols" id="barcols"></div></div>
+    </div>
+    <div class="banner" id="banner6"></div>
+    <div class="caption" id="cap6">选择场景：<b>每 batch 恒 1 ack</b> → 次数对齐、安全；<b>出错就 break</b> → ack 缺一个 → 组2 reduce 错位 → hang。</div>
+  </div>
+  <div class="controls" id="ctl6">
+    <button class="ctl primary" id="play6good">▶ 正确（每 batch 恒 1 ack）</button>
+    <button class="ctl alt" id="play6bad">▶ 错误（出错 break → ack 缺失）</button>
+    <button class="ctl" id="reset6">重置</button>
+  </div>
+</div>
+
+<script>
+const PP=3, TP=8;
+// initial hit counts per (pp,tp). some truncated by host-mem pressure.
+const HITS=[
+  [8,8,8,8,8,8,8,8],
+  [8,8,6,8,8,8,8,7],   // stage1: rank(1,2)=6, rank(1,7)=7 truncated
+  [8,8,8,8,8,8,8,8],
+];
+const ROWMIN = HITS.map(r=>Math.min(...r));      // [8,6,8]
+const GMIN = Math.min(...ROWMIN);                // 6
+const sleep=ms=>new Promise(r=>setTimeout(r,ms));
+
+/* ---------- build a mesh ---------- */
+function buildMesh(containerId, withThreads){
+  const c=document.getElementById(containerId);
+  let html='<div class="mesh-head"><div class="tp-hdr">';
+  for(let t=0;t<TP;t++) html+=`<div class="th">TP ${t}</div>`;
+  html+='</div></div>';
+  for(let p=0;p<PP;p++){
+    html+=`<div class="pp-row"><div class="pp-label"><b>PP stage ${p}</b><br>(TP 组)</div><div class="row-cells" id="${containerId}-row${p}">`;
+    for(let t=0;t<TP;t++){
+      html+=`<div class="cell" id="${containerId}-c${p}-${t}">
+        <div class="v">—</div>
+        <div class="rk">r${p*TP+t}</div>
+        ${withThreads?`<div class="tdots"><span class="td a" id="${containerId}-A-${p}-${t}"></span><span class="td b" id="${containerId}-B-${p}-${t}"></span></div>`:''}
+      </div>`;
+    }
+    html+='</div></div>';
+  }
+  // pp-group footer (columns)
+  html+='<div class="pp-foot"><div class="lab">PP 组(列)<br>每列跨 3 个 stage →</div><div class="cols">';
+  for(let t=0;t<TP;t++) html+=`<div class="col" id="${containerId}-col${t}">r${t}·r${t+TP}·r${t+2*TP}</div>`;
+  html+='</div></div>';
+  c.innerHTML=html;
+}
+const cell=(m,p,t)=>document.getElementById(`${m}-c${p}-${t}`);
+const val =(m,p,t)=>cell(m,p,t).querySelector('.v');
+
+buildMesh('mesh1',false);
+buildMesh('mesh2',true);
+
+/* ============================================================
+   TAB 1 : auto-play loop
+   query -> diverge -> TP all_reduce(MIN) -> PP all_reduce(MIN) -> consistent tree
+   ============================================================ */
+let t1Token=0, t1Paused=false;
+const cap1=document.getElementById('cap1');
+const sharedTree=document.getElementById('sharedTree');
+
+function resetMesh1(){
+  for(let p=0;p<PP;p++) for(let t=0;t<TP;t++){
+    const cl=cell('mesh1',p,t); cl.className='cell';
+    val('mesh1',p,t).innerHTML='—';
+  }
+  sharedTree.innerHTML='';
+}
+async function gate(my){ while(t1Paused){ await sleep(120); if(my!==t1Token) throw 0; } }
+async function step(ms,my){ await sleep(ms); await gate(my); if(my!==t1Token) throw 0; }
+
+async function runTab1(){
+  const my=++t1Token;
+  try{
+    while(true){
+      resetMesh1();
+      cap1.innerHTML='拓扑 <b>PP=3 × TP=8 = 24 个 rank</b>：每个 PP stage 下挂 8 个 TP rank。';
+      await step(1600,my);
+
+      // 1) independent query
+      for(let p=0;p<PP;p++) for(let t=0;t<TP;t++){
+        const v=HITS[p][t]; const cl=cell('mesh1',p,t);
+        val('mesh1',p,t).innerHTML=v+' <small>pg</small>';
+        if(v!==8) cl.classList.add('varied');
+        await sleep(35);
+      }
+      cap1.innerHTML='① 各 rank <span class="k">独立</span>向 L3 查询前缀命中。<b style="color:var(--amber)">注意 r10、r15 因 host 内存压力被截断</b>（6 / 7 页）。';
+      await step(2200,my);
+
+      // 2) diverge warning
+      cap1.innerHTML='② 若各 rank 按自己的命中数建 radix tree → 树高不一致 → 后续 PP 集合通信 <b style="color:var(--red)">shape mismatch → crash</b>。';
+      for(let p=0;p<PP;p++) for(let t=0;t<TP;t++) if(HITS[p][t]!==8){ cell('mesh1',p,t).classList.add('bad'); }
+      await step(2200,my);
+      for(let p=0;p<PP;p++) for(let t=0;t<TP;t++) cell('mesh1',p,t).classList.remove('bad','varied');
+
+      // 3) TP all_reduce(MIN) — sweep each row (all rows in parallel)
+      cap1.innerHTML='③ 第一步：在 <span class="k">TP 组（每一行 8 个 rank）</span>内 <code>all_reduce(MIN)</code>。';
+      for(let t=0;t<TP;t++){
+        for(let p=0;p<PP;p++) cell('mesh1',p,t).classList.add('sweep');
+        await step(110,my);
+        for(let p=0;p<PP;p++) cell('mesh1',p,t).classList.remove('sweep');
+      }
+      for(let p=0;p<PP;p++) for(let t=0;t<TP;t++){
+        const cl=cell('mesh1',p,t); cl.classList.add('tpmin');
+        val('mesh1',p,t).innerHTML=ROWMIN[p]+' <small>pg</small>';
+      }
+      cap1.innerHTML='③ TP 组归约后：<b>每一行变得一致</b>（PP0=8, PP1=6, PP2=8 = 各行最小值）。';
+      await step(1900,my);
+
+      // 4) PP all_reduce(MIN) — sweep each column (top->bottom)
+      cap1.innerHTML='④ 第二步：在 <span class="k">PP 组（每一列 3 个 rank）</span>内 <code>all_reduce(MIN)</code> → 收敛到全局最小值。';
+      for(let p=0;p<PP;p++){
+        for(let t=0;t<TP;t++) cell('mesh1',p,t).classList.add('sweep');
+        await step(180,my);
+        for(let t=0;t<TP;t++) cell('mesh1',p,t).classList.remove('sweep');
+      }
+      for(let p=0;p<PP;p++) for(let t=0;t<TP;t++){
+        const cl=cell('mesh1',p,t); cl.classList.remove('tpmin'); cl.classList.add('gmin');
+        val('mesh1',p,t).innerHTML=GMIN+' <small>pg</small>';
+      }
+      cap1.innerHTML='④ PP 组归约后：<b style="color:var(--green)">全部 24 个 rank 命中数 = 6</b>（最长公共前缀）。';
+      await step(1700,my);
+
+      // 5) shared consistent tree
+      cap1.innerHTML='⑤ 所有 rank 都只 prefetch / 建树到 6 → <span style="color:var(--green)">24 个 rank 的 radix tree 完全一致 ✓</span>';
+      sharedTree.innerHTML='';
+      for(let i=0;i<GMIN;i++){
+        const n=document.createElement('div'); n.className='tnode'; n.textContent='page '+i; sharedTree.appendChild(n);
+        await step(120,my); n.classList.add('show');
+      }
+      await step(2600,my);
+    }
+  }catch(e){ /* cancelled */ }
+}
+
+document.getElementById('play1').onclick=function(){
+  t1Paused=!t1Paused;
+  this.textContent=t1Paused?'▶ 播放':'⏸ 暂停';
+};
+document.getElementById('replay1').onclick=()=>{ t1Paused=false; document.getElementById('play1').textContent='⏸ 暂停'; runTab1(); };
+
+/* ============================================================
+   TAB 2 : why 2 group-sets avoid deadlock (PP×TP mesh)
+   ============================================================ */
+let t2Token=0;
+const cap2=document.getElementById('cap2');
+const banner2=document.getElementById('banner2');
+const g1=document.getElementById('g1'), g2=document.getElementById('g2');
+const row2=p=>document.getElementById('mesh2-row'+p);
+const col2=t=>document.getElementById('mesh2-col'+t);
+const dotEl=(op,p,t)=>document.getElementById(`mesh2-${op}-${p}-${t}`);
+
+function resetMesh2(){
+  ++t2Token;
+  for(let p=0;p<PP;p++){
+    row2(p).className='row-cells';
+    for(let t=0;t<TP;t++){
+      const cl=cell('mesh2',p,t); cl.className='cell';
+      val('mesh2',p,t).innerHTML=GMIN+' <small>pg</small>';
+      dotEl('A',p,t).className='td a'; dotEl('B',p,t).className='td b';
+    }
+  }
+  for(let t=0;t<TP;t++) col2(t).className='col';
+  g1.classList.remove('hot'); g2.classList.remove('hot'); g2.style.opacity=1;
+  banner2.className='banner'; banner2.textContent='';
+}
+async function s2(ms,my){ await sleep(ms); if(my!==t2Token) throw 0; }
+
+/* ---- 1 shared group set -> deadlock ---- */
+async function play1Group(){
+  resetMesh2(); const my=t2Token;
+  try{
+    g2.style.opacity=.25; g1.classList.add('hot');
+    cap2.innerHTML='只有 <b>1 套组</b>：prefetch_thread(A) 与 prefetch_sync_thread(B) 共用同一个 communicator 集。';
+    await s2(800,my);
+
+    // each rank's two threads race: some submit A first (purple), some B first (cyan)
+    cap2.innerHTML='两个后台线程<b>独立调度、顺序不定</b>：同一个 TP 环里，有的 rank 先发 A，有的先发 B。';
+    for(let p=0;p<PP;p++){
+      for(let t=0;t<TP;t++){
+        const aFirst=((p+t)%2===0);      // deterministic but mixed within each row
+        const first=aFirst?'A':'B';
+        dotEl(first,p,t).classList.add('on');
+      }
+    }
+    await s2(1200,my);
+
+    // rings cannot align -> red
+    cap2.innerHTML='同一个 communicator 上各 rank 提交的 collective <b style="color:var(--red)">不是同一个</b>（A 与 B 错位）→ rendezvous 永远配不上。';
+    for(let p=0;p<PP;p++){
+      row2(p).classList.add('ring-bad');
+      for(let t=0;t<TP;t++){
+        cell('mesh2',p,t).classList.add('bad');
+        const aFirst=((p+t)%2===0);
+        dotEl(aFirst?'A':'B',p,t).className=`td ${aFirst?'a':'b'} dead`;
+      }
+    }
+    for(let t=0;t<TP;t++) col2(t).classList.add('ring-bad');
+    await s2(700,my);
+    banner2.className='banner bad'; banner2.textContent='💥 DEADLOCK — 整个 24-rank job 卡死';
+    cap2.innerHTML='只要任一 communicator 上 A/B 交错，该环就死锁 → 全局 PP/TP 通信连环卡住。';
+  }catch(e){}
+}
+
+/* ---- 2 group sets -> safe ---- */
+async function play2Groups(){
+  resetMesh2(); const my=t2Token;
+  try{
+    g1.classList.add('hot'); g2.classList.add('hot');
+    cap2.innerHTML='用 <b>2 套独立组</b>：<b style="color:var(--purple)">A 永远走 prefetch_hits_sync_groups</b>，<b style="color:var(--cyan)">B 永远走 prefetch_completion_sync_groups</b>。';
+    await s2(800,my);
+
+    // wave A: every rank's prefetch_thread submits A to group-set-1; TP rings + PP rings light purple
+    cap2.innerHTML='第一波：所有 rank 的 <b>prefetch_thread</b> 只在 <code>prefetch_hits_sync_groups</code> 上提交 A → 序列一致。';
+    for(let p=0;p<PP;p++){
+      row2(p).classList.add('ring-a');
+      for(let t=0;t<TP;t++) dotEl('A',p,t).classList.add('on');
+    }
+    for(let t=0;t<TP;t++) col2(t).classList.add('ring-a');
+    await s2(1100,my);
+    for(let p=0;p<PP;p++){
+      row2(p).className='row-cells';
+      for(let t=0;t<TP;t++) dotEl('A',p,t).className='td a done';
+    }
+    for(let t=0;t<TP;t++) col2(t).className='col';
+    cap2.innerHTML='✓ TP 环 + PP 环上 A 全部到齐 → 第一波归约完成。';
+    await s2(900,my);
+
+    // wave B: prefetch_sync_thread submits B to group-set-2; rings light cyan
+    cap2.innerHTML='第二波：所有 rank 的 <b>prefetch_sync_thread</b> 只在 <code>prefetch_completion_sync_groups</code> 上提交 B → 序列一致。';
+    for(let p=0;p<PP;p++){
+      row2(p).classList.add('ring-b');
+      for(let t=0;t<TP;t++) dotEl('B',p,t).classList.add('on');
+    }
+    for(let t=0;t<TP;t++) col2(t).classList.add('ring-b');
+    await s2(1100,my);
+    for(let p=0;p<PP;p++){
+      row2(p).className='row-cells';
+      for(let t=0;t<TP;t++){ dotEl('B',p,t).className='td b done'; cell('mesh2',p,t).classList.add('gmin'); }
+    }
+    for(let t=0;t<TP;t++) col2(t).className='col';
+    banner2.className='banner ok'; banner2.textContent='✅ 安全 — 24 个 rank 全部对齐完成';
+    cap2.innerHTML='每个 communicator 上的 collective 序列在所有 rank <b style="color:var(--green)">完全一致</b>（A→组1、B→组2 不交叉）→ 不会死锁。';
+  }catch(e){}
+}
+
+document.getElementById('play1grp').onclick=play1Group;
+document.getElementById('play2grp').onclick=play2Groups;
+document.getElementById('reset2').onclick=()=>{ resetMesh2(); cap2.innerHTML='选择场景：用 <b>1 套组</b> 会死锁，用 <b>2 套组</b> 则安全。'; };
+
+/* ============================================================
+   TAB 3 : async PP skew  x  all_reduce(MIN) unifies pace
+   Top: continuous, skewed micro-batch pipeline (CSS infinite) — never pauses.
+   Bottom: time-driven (rAF). Each rank's prefetch op arrives at the MIN barrier
+   at a DIFFERENT wall-clock time (async skew). Early arrivals park & wait on the
+   gloo CPU group (background thread). When the slowest arrives, one MIN flash
+   unifies all three to 6 and they depart together — while the top pipeline keeps
+   flowing untouched.
+   ============================================================ */
+const NMB=4;                          // micro-batches per batch (illustrative)
+const MB_LABELS=Array.from({length:NMB},(_,k)=>'mb'+k);
+// build top pipeline micro-batches: one controllable block per (stage, mb)
+(function buildPipe(){
+  for(let s=0;s<3;s++){
+    const lane=document.querySelector('#pipe .s'+s);
+    for(let k=0;k<NMB;k++){
+      const mb=document.createElement('div');
+      mb.className='mb'; mb.id=`pmb-${s}-${k}`; mb.textContent=MB_LABELS[k];
+      lane.appendChild(mb);
+    }
+  }
+})();
+// build batch formers (one per PP rank): hit input + ordered mb chips
+(function buildFormers(){
+  const host=document.getElementById('formers');
+  for(let p=0;p<3;p++){
+    const f=document.createElement('div'); f.className='former'; f.id='former'+p;
+    let chips='';
+    for(let k=0;k<NMB;k++) chips+=`<div class="mbchip" id="fchip-${p}-${k}">${MB_LABELS[k]}</div>`;
+    f.innerHTML=`<h5>调度器 · PP rank ${p}</h5>
+      <div class="hitbox"><span class="ht1">已缓存前缀 storage hit = </span><b id="fhit${p}">？</b><span class="ht2"> 页 → 决定 batch 组成</span></div>
+      <div class="mbrow">${chips}</div>
+      <div class="chk" id="fchk${p}"></div>`;
+    host.appendChild(f);
+  }
+})();
+const arrow1=document.getElementById('arrow1');
+const arrow2=document.getElementById('arrow2');
+
+const PKT=[
+  { el:document.getElementById('pkt0'), lab:document.getElementById('sl0'), y:34,  hit:8, arrive:2.6 },
+  { el:document.getElementById('pkt1'), lab:document.getElementById('sl1'), y:85,  hit:6, arrive:1.9 }, // arrives first
+  { el:document.getElementById('pkt2'), lab:document.getElementById('sl2'), y:136, hit:7, arrive:3.9 }, // slowest -> everyone waits
+];
+const GMIN3=Math.min(...PKT.map(p=>p.hit)); // 6
+const T3={ START:0.4, SYNC:3.9, FLASH_END:4.5, DEPART_END:5.4,
+           DROP:4.5, MB_START:5.0, MB_STEP:0.35, READY:6.4, CYCLE:14.0,
+           PIPE_START:6.4, MB_LAG:1.0, STAGE_LAG:0.9, MB_TRAVEL:3.2,
+           X0:11, XB:62, X1:97 }; // seconds / percentages
+const cap3=document.getElementById('cap3');
+const barrier=document.getElementById('barrier');
+const t3clock=document.getElementById('t3clock');
+let t3raf=null, t3on=false, t3paused=false, t3start=0, t3lastT=0;
+
+function placePkt(p){ p.lab.style.top=p.y+'px'; p.el.style.top=p.y+'px'; }
+PKT.forEach(placePkt);
+function lerp(a,b,u){ return a+(b-a)*Math.max(0,Math.min(1,u)); }
+// returns progress 0..1 if t (or its wrapped form) is inside [st,en), else -1
+function pipeActive(st,en,t){
+  if(t>=st && t<en) return (t-st)/(en-st);
+  const tw=t+T3.CYCLE;
+  if(tw>=st && tw<en) return (tw-st)/(en-st);   // previous batch still draining across loop
+  return -1;
+}
+
+function resetFormers(){
+  arrow1.classList.remove('hot'); arrow2.classList.remove('hot');
+  for(let p=0;p<3;p++){
+    document.getElementById('fhit'+p).textContent='？';
+    document.getElementById('former'+p).classList.remove('ready');
+    document.getElementById('fchk'+p).textContent='';
+    for(let k=0;k<NMB;k++) document.getElementById(`fchip-${p}-${k}`).className='mbchip';
+  }
+}
+
+function t3frame(now){
+  if(!t3on){ return; }
+  if(!t3paused){
+    let t=((now - t3start)/1000) % T3.CYCLE;
+    t3lastT=t;
+    t3clock.textContent='t = '+t.toFixed(1)+'s';
+    barrier.classList.toggle('fire', (t>=T3.SYNC && t<T3.FLASH_END));
+
+    // --- layer 1: async packets toward MIN barrier ---
+    PKT.forEach(p=>{
+      let xPct, cls;
+      if(t < T3.START){ xPct=T3.X0; cls='travel'; }
+      else if(t < p.arrive){ xPct=lerp(T3.X0, T3.XB, (t-T3.START)/(p.arrive-T3.START)); cls='travel'; }
+      else if(t < T3.FLASH_END){ xPct=T3.XB; cls='wait'; }
+      else if(t < T3.DEPART_END){ xPct=lerp(T3.XB, T3.X1, (t-T3.FLASH_END)/(T3.DEPART_END-T3.FLASH_END)); cls='unified'; }
+      else { xPct=T3.X1; cls='unified'; }
+      p.el.style.left='calc('+xPct+'% - 54px)';
+      p.el.className='pkt '+cls;
+      p.el.querySelector('.hv').textContent = (t>=T3.SYNC? GMIN3 : p.hit);
+    });
+
+    // --- layer 2: batch formers driven by unified value ---
+    if(t < T3.FLASH_END){
+      resetFormers();
+    } else {
+      arrow2.classList.add('hot');                 // MIN -> value out
+      for(let p=0;p<3;p++) document.getElementById('fhit'+p).textContent=GMIN3;
+      // light mb chips in identical order across all three formers
+      let lit=0;
+      for(let k=0;k<NMB;k++){
+        if(t >= T3.MB_START + k*T3.MB_STEP){
+          for(let p=0;p<3;p++) document.getElementById(`fchip-${p}-${k}`).classList.add('on');
+          lit++;
+        }
+      }
+      if(t >= T3.READY){
+        arrow1.classList.add('hot');               // batch -> pipeline content
+        for(let p=0;p<3;p++){
+          document.getElementById('former'+p).classList.add('ready');
+          document.getElementById('fchk'+p).textContent='✓ batch & mb 顺序一致';
+          for(let k=0;k<NMB;k++) document.getElementById(`fchip-${p}-${k}`).className='mbchip fixed';
+        }
+      } else {
+        arrow1.classList.remove('hot');
+      }
+    }
+
+    // --- layer 3: the formed mb0..mb3 flow through stage0->1->2 (diagonal pipeline) ---
+    for(let s=0;s<3;s++){
+      for(let k=0;k<NMB;k++){
+        const b=document.getElementById(`pmb-${s}-${k}`);
+        const st=T3.PIPE_START + k*T3.MB_LAG + s*T3.STAGE_LAG;
+        const u=pipeActive(st, st+T3.MB_TRAVEL, t);   // handles cycle wrap (previous batch still draining)
+        if(u>=0){ b.style.left=(19+u*73)+'%'; b.style.opacity=1; }
+        else b.style.opacity=0;
+      }
+      document.getElementById('hint'+s).textContent = (t>1.4 && t<T3.PIPE_START) ? '（等待②组好的 batch…）' : '';
+    }
+
+    // --- captions ---
+    if(t < T3.START) cap3.innerHTML='① 三个 PP rank 的 prefetch 查询<strong>异步发起</strong>（到达时刻不同）。';
+    else if(t < PKT[2].arrive) cap3.innerHTML='① 先到的 rank 在 <span class="k">gloo CPU 后台线程</span>上<b style="color:var(--amber)">等待对齐</b>（不占 GPU）。';
+    else if(t < T3.FLASH_END) cap3.innerHTML='① <b style="color:var(--amber)">pp2 最慢</b>到达 → <code>all_reduce(MIN)</code> 把 8/6/7 <strong style="color:var(--green)">统一成 6</strong>。';
+    else if(t < T3.READY) cap3.innerHTML='② 统一后的 <b>storage hit = 6</b> 下发给各 rank 调度器 → 决定<strong>已缓存前缀长度 / batch size / micro-batch 顺序</strong>（mb0→mb3）。';
+    else cap3.innerHTML='③ 三个 rank 因拿到<strong style="color:var(--green)">同一个 6</strong> 而组出<strong style="color:var(--green)">完全一致的 batch 与 mb 顺序</strong>，喂给 PP 流水线；执行时序连续不被打断。<br><span style="color:var(--red)">⚠ 若 storage hit 不统一 → batch/mb 顺序逐 rank 发散 → PP 调度错位、卡死。</span>';
+  }
+  t3raf=requestAnimationFrame(t3frame);
+}
+function startTab3(restart){
+  t3on=true;
+  if(restart || !t3start){ t3start=performance.now(); t3paused=false; document.getElementById('play3').textContent='⏸ 暂停'; }
+  document.getElementById('pipe').classList.remove('paused');
+  if(!t3raf) t3raf=requestAnimationFrame(t3frame);
+}
+function stopTab3(){ t3on=false; if(t3raf){ cancelAnimationFrame(t3raf); t3raf=null; } }
+
+document.getElementById('play3').onclick=function(){
+  t3paused=!t3paused;
+  if(t3paused){ t3start=performance.now() - t3lastT*1000; } // freeze
+  else { t3start=performance.now() - t3lastT*1000; }        // resume from frozen t
+  this.textContent=t3paused?'▶ 播放':'⏸ 暂停';
+  document.getElementById('pipe').classList.toggle('paused', t3paused);
+};
+document.getElementById('replay3').onclick=()=>startTab3(true);
+
+/* ============================================================
+   TAB 4 : animated walk-through of the prefetch thread pipeline.
+   Highlights each stage in sequence; the chain "lights up" as the
+   data (PrefetchOperation → Ack → completed_tokens) flows down, and
+   the matching right-side why-card glows at each MIN sync point.
+   ============================================================ */
+let t4Token=0, t4Paused=false;
+const cap4El=document.getElementById('cap4');
+// [ids-to-light, caption]
+const T4SEQ=[
+  [['t4b0'], '调度器主线程把 prefetch 请求（writeback / load）放入队列，触发后台流水线。'],
+  [['t4a0'], '<b>PrefetchOperation</b> 入队 <code>prefetch_queue</code>，交给后台线程处理。'],
+  [['t4b1'], '① <b>prefetch_thread</b> 调 <code>_storage_hit_query()</code> 查询 L3 命中页数（各 rank 可能不同）。'],
+  [['t4m1','t4wa'], '◆ <b style="color:var(--amber)">第一个 MIN</b>：在 <code>prefetch_hits_sync_groups</code>（组1）对 storage_hit_count 取最小 → <b>抓取范围逐 rank 一致</b>。'],
+  [['t4a1'], '命中足够的请求落入 <code>prefetch_buffer</code>，进入实际 IO 加载。'],
+  [['t4b2'], '② <b>prefetch_io_aux_thread</b> 用 <code>_page_transfer()</code> 逐 batch 把页 L3→host；<b>每个 batch 恒产生 1 个 PrefetchAck</b>（出错也产生）。'],
+  [['t4a2'], '每个 batch 的 <b>PrefetchAck</b> 入队 <code>prefetch_sync_queue</code>。'],
+  [['t4b3'], '③ <b>prefetch_sync_thread</b> 对每个 ack 的 <b>completed_tokens</b> 做归约。'],
+  [['t4m2','t4wb'], '◆ <b style="color:var(--green)">第二个 MIN</b>：在 <code>prefetch_completion_sync_groups</code>（组2）对 completed_tokens 取最小 → <b>真正落盘前缀逐 rank 一致</b>。'],
+  [['t4a3'], '统一后的结果入队 <code>ack_prefetch_queue</code> 回到调度器。'],
+  [['t4b4','t4wc'], '调度器只插入 <b>completed_tokens</b> 长度的前缀 → <code>_insert_helper_host()</code>。每 batch 恒 1 个 ack，<b>reduce 次数严格相等 → 不会 hang</b>。'],
+];
+function t4clear(){
+  document.querySelectorAll('#scene4 .lit').forEach(e=>e.classList.remove('lit'));
+  document.querySelectorAll('#scene4 .dimmed').forEach(e=>e.classList.remove('dimmed'));
+}
+async function t4gate(my){ while(t4Paused){ await sleep(120); if(my!==t4Token) throw 0; } }
+async function t4step(ms,my){ await sleep(ms); await t4gate(my); if(my!==t4Token) throw 0; }
+async function runTab4(){
+  const my=++t4Token;
+  try{
+    while(true){
+      t4clear();
+      cap4El.innerHTML='沿数据流向下逐步点亮：两个 MIN 同步点 + 每 batch 恒定 1 个 ack。';
+      await t4step(1200,my);
+      for(const [ids,cap] of T4SEQ){
+        await t4gate(my);
+        ids.forEach(id=>document.getElementById(id).classList.add('lit'));
+        cap4El.innerHTML=cap;
+        await t4step(1900,my);
+      }
+      cap4El.innerHTML='✅ 闭环：两个 MIN（组1 命中数 + 组2 完成数）+ 每 batch 1 个 ack → <b style="color:var(--green)">PP 各 rank 的 host radix tree 严格一致</b>。';
+      await t4step(2600,my);
+    }
+  }catch(e){ /* cancelled */ }
+}
+function stopTab4(){ ++t4Token; }
+
+document.getElementById('play4').onclick=function(){
+  t4Paused=!t4Paused;
+  this.textContent=t4Paused?'▶ 播放':'⏸ 暂停';
+};
+document.getElementById('replay4').onclick=()=>{ t4Paused=false; document.getElementById('play4').textContent='⏸ 暂停'; runTab4(); };
+
+/* ============================================================
+   TAB 5 : two-request full lifecycle (PP=3 × TP=4).
+   Req A misses (GPU compute → L2 insert → L3 backup), then L2 is
+   evicted (delete, identical across ranks); Req B hits L3 and goes
+   through the two MIN syncs so every PP rank inserts the SAME prefix
+   into its host radix tree → trees stay consistent, no deadlock.
+   ============================================================ */
+const NPG=4;                       // pages tracked in the story
+const RANK_NAMES=['PP rank 0','PP rank 1','PP rank 2'];
+(function buildStory(){
+  let h='';
+  for(let p=0;p<3;p++){
+    let tps=''; for(let t=0;t<4;t++) tps+=`<span class="tp" id="s5tp-${p}-${t}"></span>`;
+    let nodes='<span class="root">host root</span>';
+    for(let i=0;i<NPG;i++) nodes+=`<div class="htnode" id="s5n-${p}-${i}">p${i}</div>`;
+    h+=`<div class="ranklane" id="s5lane${p}">
+      <div class="rankhdr"><span class="rname">${RANK_NAMES[p]}</span><span class="tps">${tps}</span>
+        <span class="rstat" id="s5stat${p}">idle</span></div>
+      <div class="htree">${nodes}</div></div>`;
+  }
+  document.getElementById('ranks').innerHTML=h;
+  let l3=''; for(let i=0;i<NPG;i++) l3+=`<div class="pg" id="s5l3-${i}">p${i}</div>`;
+  document.getElementById('l3pages').innerHTML=l3;
+})();
+
+let t5Token=0, t5Paused=false;
+const cap5=document.getElementById('cap5');
+const s5n=(p,i)=>document.getElementById(`s5n-${p}-${i}`);
+const setNode=(p,i,cls)=>{ s5n(p,i).className='htnode show '+cls; };
+const hideNode=(p,i)=>{ s5n(p,i).className='htnode'; };
+function rstat(p,txt,cls){ const e=document.getElementById('s5stat'+p); e.className='rstat '+(cls||''); e.textContent=txt; }
+function s5flag(txt,cls){ const e=document.getElementById('s5flag'); e.className='consist-flag '+(cls||''); e.innerHTML=txt; }
+function s5reset(){
+  for(let p=0;p<3;p++){
+    document.getElementById('s5lane'+p).className='ranklane';
+    rstat(p,'idle','');
+    for(let t=0;t<4;t++) document.getElementById(`s5tp-${p}-${t}`).className='tp';
+    for(let i=0;i<NPG;i++) hideNode(p,i);
+  }
+  for(let i=0;i<NPG;i++) document.getElementById('s5l3-'+i).className='pg';
+  document.getElementById('gpuBadge').className='gpu-badge';
+  document.getElementById('l3box').className='l3box';
+  document.getElementById('l3badge').className='badge'; document.getElementById('l3badge').textContent='';
+  document.getElementById('s5sync1').className='syncbadge g1';
+  document.getElementById('s5sync2').className='syncbadge g2';
+  s5flag('','');
+}
+async function t5gate(my){ while(t5Paused){ await sleep(120); if(my!==t5Token) throw 0; } }
+async function t5step(ms,my){ await sleep(ms); await t5gate(my); if(my!==t5Token) throw 0; }
+const allRanks=fn=>{ for(let p=0;p<3;p++) fn(p); };
+
+async function runTab5(){
+  const my=++t5Token;
+  try{
+    while(true){
+      s5reset();
+      cap5.innerHTML='场景 <b>PP=3 × TP=4</b>：每个 PP rank 维护一棵 <b>L2 host radix tree</b>，共享底层 <b>L3 持久化存储</b>。跟踪两个请求，看 host tree 如何保持一致。';
+      await t5step(2000,my);
+
+      /* ===== ACT 1 : Request A — miss → GPU → L2 insert → L3 backup ===== */
+      cap5.innerHTML='① <b>请求 A</b> 到达（需要 4 个 page 的前缀），3 个 PP rank 同时处理。';
+      allRanks(p=>{ document.getElementById('s5lane'+p).classList.add('active'); for(let t=0;t<4;t++) document.getElementById(`s5tp-${p}-${t}`).className='tp on'; rstat(p,'req A',''); });
+      await t5step(1700,my);
+
+      cap5.innerHTML='① 查 L2 host tree → <b style="color:var(--red)">miss</b>；查 L3 → <b style="color:var(--red)">miss</b>（存储为空）。';
+      allRanks(p=>rstat(p,'L2/L3 miss','miss'));
+      document.getElementById('l3badge').className='badge miss'; document.getElementById('l3badge').textContent='miss';
+      await t5step(1900,my);
+
+      cap5.innerHTML='① 回退到 <b>GPU 前向计算</b>，生成这 4 个 page 的 KV。';
+      document.getElementById('gpuBadge').classList.add('busy');
+      allRanks(p=>rstat(p,'compute','warn'));
+      await t5step(1800,my);
+      document.getElementById('gpuBadge').classList.remove('busy');
+
+      cap5.innerHTML='① 计算结果写入 <b>L2 host radix tree</b> → 3 个 rank <code>insert</code> <strong style="color:var(--green)">相同</strong>的前缀 p0–p3。';
+      for(let i=0;i<NPG;i++){ allRanks(p=>setNode(p,i,'inserting')); await t5step(240,my); }
+      allRanks(p=>{ for(let i=0;i<NPG;i++) setNode(p,i,'committed'); rstat(p,'L2: 4','hit'); });
+      s5flag('✓ 3 棵 host tree 同步插入 4 个 page（一致）','ok');
+      await t5step(1600,my);
+
+      cap5.innerHTML='① backup 线程把 L2 → <b>L3</b> 持久化（<code>write_backup</code> / <code>page_set</code>）。';
+      document.getElementById('l3box').classList.add('hot');
+      document.getElementById('l3badge').className='badge hit'; document.getElementById('l3badge').textContent='stored';
+      for(let i=0;i<NPG;i++){ document.getElementById('s5l3-'+i).className='pg show l3'; await t5step(220,my); }
+      s5flag('','');
+      await t5step(1500,my);
+
+      /* ===== ACT 1.5 : L2 eviction (delete consistency) ===== */
+      cap5.innerHTML='② host 内存压力 → L2 触发<strong style="color:var(--red)">淘汰</strong>（<code>evict_host</code>）。3 棵 host tree <b>完全一致</b> → 淘汰命中<strong>同一批节点</strong>；L3 仍保留。';
+      for(let i=NPG-1;i>=0;i--){ allRanks(p=>setNode(p,i,'evict')); await t5step(330,my); allRanks(p=>hideNode(p,i)); }
+      allRanks(p=>rstat(p,'L2 empty',''));
+      s5flag('✓ 3 棵 host tree 同步删除（delete 一致）','ok');
+      await t5step(2000,my);
+      s5flag('','');
+
+      /* ===== ACT 2 : Request B — L3 hit → 2 MIN syncs → consistent insert ===== */
+      cap5.innerHTML='③ <b>请求 B</b> 到达（复用 A 的前缀）。L2 host tree 已空 → <b style="color:var(--red)">L2 miss</b>，转向 L3。';
+      allRanks(p=>{ document.getElementById('s5lane'+p).classList.add('active'); rstat(p,'req B','warn'); });
+      await t5step(1700,my);
+
+      cap5.innerHTML='③ <b>prefetch_thread</b> 各 rank 向 L3 查命中页数 → 结果可能<strong style="color:var(--amber)">不同</strong>（host 视图/内存差异）：4 / 3 / 4。';
+      const hitq=[4,3,4];
+      allRanks(p=>{ rstat(p,'L3 hit '+hitq[p], hitq[p]===4?'hit':'warn'); for(let i=0;i<hitq[p];i++) setNode(p,i,'warn'); });
+      s5flag('⚠ 查询长度不一致（4/3/4）→ 若各自建树，host tree 会发散','bad');
+      await t5step(2600,my);
+
+      cap5.innerHTML='◆ <b>第一个 MIN</b> @ <code>prefetch_hits_sync_groups</code>（组1，gloo/CPU，含 TP环+PP环）：<code>all_reduce(MIN)</code> 统一查询长度 = <b>3</b>。';
+      document.getElementById('s5sync1').classList.add('fire');
+      await t5step(1500,my);
+      allRanks(p=>{ for(let i=0;i<NPG;i++) hideNode(p,i); for(let i=0;i<3;i++) setNode(p,i,'matched'); rstat(p,'match 3','hit'); });
+      s5flag('✓ 抓取范围统一 = 3 → match_prefix 逐 rank 一致','ok');
+      await t5step(2200,my);
+
+      cap5.innerHTML='③ <b>prefetch_io_aux_thread</b> 逐 batch 把 page 从 L3 拉回 L2（<code>_page_transfer</code>），每 batch 产 1 个 PrefetchAck。';
+      document.getElementById('s5sync1').classList.remove('fire');
+      for(let i=0;i<3;i++){ allRanks(p=>setNode(p,i,'inserting')); await t5step(300,my); }
+      await t5step(700,my);
+
+      cap5.innerHTML='③ 逐页加载<strong style="color:var(--amber)">部分失败</strong>：rank2 第 3 页 <code>page_get</code> 未成功 → completed_tokens = 3 / 3 / 2。';
+      const done=[3,3,2];
+      allRanks(p=>{ for(let i=0;i<3;i++){ if(i<done[p]) setNode(p,i,'committed'); else setNode(p,i,'warn'); } rstat(p,'done '+done[p], done[p]===3?'hit':'warn'); });
+      s5flag('⚠ 实际落盘不一致（3/3/2）→ 若各自插入，host tree 会发散','bad');
+      await t5step(2600,my);
+
+      cap5.innerHTML='◆ <b>第二个 MIN</b> @ <code>prefetch_completion_sync_groups</code>（组2，<b>独立 communicator</b>）：<code>all_reduce(MIN)</code> 统一 completed_tokens = <b>2</b>。';
+      document.getElementById('s5sync2').classList.add('fire');
+      await t5step(1500,my);
+
+      cap5.innerHTML='③ 各 rank 只把统一的 <b>2 个 page</b> 插入 L2 host tree（<code>_insert_helper_host</code>）→ 3 棵 host tree <strong style="color:var(--green)">再次完全一致</strong>。';
+      allRanks(p=>{ for(let i=0;i<NPG;i++) hideNode(p,i); for(let i=0;i<2;i++) setNode(p,i,'committed'); rstat(p,'L2: 2','hit'); });
+      s5flag('✓ 插入长度统一 = 2 → 3 棵 host tree 完全一致','ok');
+      await t5step(2400,my);
+
+      cap5.innerHTML='✅ 两套<strong>独立 gloo 组</strong>（组1 命中数、组2 完成数）+ 每 batch 恒 1 个 ack → 各 rank 对 host tree 的<strong>插入/删除完全一致</strong> → <b style="color:var(--green)">host radix tree 始终一致，后台 collective 不会死锁</b>。';
+      document.getElementById('s5sync1').classList.add('fire');
+      await t5step(3400,my);
+      document.getElementById('s5sync1').classList.remove('fire');
+      document.getElementById('s5sync2').classList.remove('fire');
+      await t5step(700,my);
+    }
+  }catch(e){ /* cancelled */ }
+}
+function stopTab5(){ ++t5Token; }
+
+document.getElementById('play5').onclick=function(){
+  t5Paused=!t5Paused;
+  this.textContent=t5Paused?'▶ 播放':'⏸ 暂停';
+};
+document.getElementById('replay5').onclick=()=>{ t5Paused=false; document.getElementById('play5').textContent='⏸ 暂停'; runTab5(); };
+
+/* ============================================================
+   TAB 6 : PrefetchAck count alignment & anti-hang.
+   Each storage batch in _page_transfer emits exactly one PrefetchAck,
+   and prefetch_sync_thread does one all_reduce(MIN) on set-2 per ack.
+   So #acks == #batches == #set-2 collectives, and it must be equal on
+   every rank. We compare: (good) each batch always emits 1 ack even on
+   error → counts aligned → safe; (bad) break-on-error drops an ack →
+   one rank does fewer reduces → the others block forever → hang.
+   ============================================================ */
+const T6NB=3;                 // number of storage batches / acks
+(function buildAck(){
+  let m='';
+  for(let p=0;p<3;p++){
+    let slots='';
+    for(let k=0;k<T6NB;k++) slots+=`<div class="ackchip pending" id="ack-${p}-${k}">ack${k}</div>`;
+    m+=`<div class="ackrow" id="ackrow${p}"><div class="acklabel"><b>PP rank ${p}</b> · _page_transfer</div><div class="ackslots">${slots}</div></div>`;
+  }
+  document.getElementById('ackmesh').innerHTML=m;
+  let b='';
+  for(let k=0;k<T6NB;k++) b+=`<div class="bar" id="bar${k}">barrier ${k}<span class="bcount" id="bcnt${k}">0/3</span></div>`;
+  document.getElementById('barcols').innerHTML=b;
+})();
+
+let t6Token=0;
+const cap6=document.getElementById('cap6');
+const banner6=document.getElementById('banner6');
+const ackEl=(p,k)=>document.getElementById(`ack-${p}-${k}`);
+const barEl=k=>document.getElementById('bar'+k);
+const bcnt=k=>document.getElementById('bcnt'+k);
+function t6reset(){
+  ++t6Token;
+  for(let p=0;p<3;p++){
+    document.getElementById('ackrow'+p).className='ackrow';
+    for(let k=0;k<T6NB;k++){ ackEl(p,k).className='ackchip pending'; ackEl(p,k).innerHTML=`ack${k}`; }
+  }
+  for(let k=0;k<T6NB;k++){ barEl(k).className='bar'; bcnt(k).textContent='0/3'; }
+  banner6.className='banner'; banner6.textContent='';
+}
+async function s6(ms,my){ await sleep(ms); if(my!==t6Token) throw 0; }
+
+// nacks: how many acks each rank produces (rank index -> count)
+async function playAck(nacks, label){
+  t6reset(); const my=t6Token;
+  try{
+    cap6.innerHTML=label;
+    await s6(700,my);
+    for(let k=0;k<T6NB;k++){
+      barEl(k).classList.add('waiting');
+      let arrived=0;
+      // ranks emit ack k one by one (async arrival)
+      for(let p=0;p<3;p++){
+        await s6(520,my);
+        if(k<nacks[p]){
+          ackEl(p,k).className='ackchip emit';
+          arrived++; bcnt(k).textContent=arrived+'/3';
+        }
+      }
+      await s6(400,my);
+      if(arrived===3){
+        barEl(k).className='bar fired'; bcnt(k).textContent='3/3 ✓';
+        for(let p=0;p<3;p++) ackEl(p,k).className='ackchip passed';
+        cap6.innerHTML=(window.TR||((z,e)=>z))(
+          `barrier ${k}：3 个 rank 的 ack 都到齐 → <code>all_reduce(MIN)</code> 返回 → 本轮通过。`,
+          `barrier ${k}: all 3 ranks' acks arrived → <code>all_reduce(MIN)</code> returns → this round passes.`);
+        await s6(700,my);
+      }else{
+        // a rank is missing this ack -> collective can never complete
+        barEl(k).className='bar dead'; bcnt(k).textContent=arrived+'/3 ✗';
+        for(let p=0;p<3;p++){
+          if(k<nacks[p]){ ackEl(p,k).className='ackchip wait'; document.getElementById('ackrow'+p).classList.add('blocked'); }
+          else { ackEl(p,k).className='ackchip missing'; ackEl(p,k).innerHTML=`ack${k}<span class="err">${(window.TR||((z,e)=>z))('缺失','missing')}</span>`; }
+        }
+        cap6.innerHTML=(window.TR||((z,e)=>z))(
+          `barrier ${k}：只有 <b>${arrived}/3</b> 个 rank 进了 <code>all_reduce</code>（有 rank 早早 break、少产一个 ack）→ 已到达的 rank <b style="color:var(--amber)">永远阻塞</b>在这次 collective 上。`,
+          `barrier ${k}: only <b>${arrived}/3</b> ranks entered <code>all_reduce</code> (a rank broke early and emitted one fewer ack) → the ranks that arrived are <b style="color:var(--amber)">blocked forever</b> on this collective.`);
+        banner6.className='banner bad'; banner6.textContent='💥 HANG：组2 reduce 次数不一致（3/3/2）→ collective 永不返回';
+        return;
+      }
+    }
+    banner6.className='banner ok'; banner6.textContent='✅ 安全：每 rank 都做了 '+T6NB+' 次 reduce，次数严格相等，全部对齐完成';
+    cap6.innerHTML='每个 batch 恒产 1 个 ack（出错也产）→ <b>ack 数逐 rank 相等</b> → 组2 的 collective 一一对应 → 不会 hang。';
+  }catch(e){ /* cancelled */ }
+}
+function stopTab6(){ ++t6Token; }
+document.getElementById('play6good').onclick=()=>playAck([3,3,3],
+  '<b style="color:var(--green)">正确</b>：即便某 batch 出错，<code>_page_transfer</code> 也<strong>继续循环、照常产 ack</strong> → 三个 rank 都产 3 个 ack。');
+document.getElementById('play6bad').onclick=()=>playAck([3,3,2],
+  '<b style="color:var(--red)">错误（反面教材）</b>：rank2 在 batch2 出错就 <code>break</code> → 只产 2 个 ack，比别人少一个。');
+document.getElementById('reset6').onclick=()=>{ t6reset(); cap6.innerHTML='选择场景：<b>每 batch 恒 1 ack</b> → 次数对齐、安全；<b>出错就 break</b> → ack 缺一个 → 组2 reduce 错位 → hang。'; };
+
+/* ---------- tab switching ---------- */
+const ctl1=document.querySelectorAll('.controls')[1]; // [0] is now ctl5 (story)
+const ctl2=document.getElementById('ctl2');
+const ctl3=document.getElementById('ctl3');
+const ctl4=document.getElementById('ctl4');
+const ctl5=document.getElementById('ctl5');
+const ctl6=document.getElementById('ctl6');
+document.querySelectorAll('.tab').forEach(tab=>{
+  tab.onclick=()=>{
+    document.querySelectorAll('.tab').forEach(x=>x.classList.remove('active'));
+    tab.classList.add('active');
+    const w=tab.dataset.tab;
+    document.getElementById('scene5').classList.toggle('hidden', w!=='story');
+    document.getElementById('scene1').classList.toggle('hidden', w!=='consistency');
+    document.getElementById('scene2').classList.toggle('hidden', w!=='deadlock');
+    document.getElementById('scene3').classList.toggle('hidden', w!=='skew');
+    document.getElementById('scene4').classList.toggle('hidden', w!=='threads');
+    document.getElementById('scene6').classList.toggle('hidden', w!=='ackalign');
+    ctl5.style.display = w==='story'?'flex':'none';
+    ctl1.style.display = w==='consistency'?'flex':'none';
+    ctl2.style.display = w==='deadlock'?'flex':'none';
+    ctl3.style.display = w==='skew'?'flex':'none';
+    ctl4.style.display = w==='threads'?'flex':'none';
+    ctl6.style.display = w==='ackalign'?'flex':'none';
+    if(w==='story'){ ++t1Token; stopTab3(); stopTab4(); stopTab6(); t5Paused=false; document.getElementById('play5').textContent='⏸ 暂停'; runTab5(); }
+    else if(w==='consistency'){ t1Paused=false; document.getElementById('play1').textContent='⏸ 暂停'; runTab1(); stopTab3(); stopTab4(); stopTab5(); stopTab6(); }
+    else if(w==='skew'){ ++t1Token; startTab3(true); stopTab4(); stopTab5(); stopTab6(); }
+    else if(w==='threads'){ ++t1Token; stopTab3(); stopTab5(); stopTab6(); t4Paused=false; document.getElementById('play4').textContent='⏸ 暂停'; runTab4(); }
+    else if(w==='ackalign'){ ++t1Token; stopTab3(); stopTab4(); stopTab5(); t6reset(); }
+    else{ ++t1Token; stopTab3(); stopTab4(); stopTab5(); stopTab6(); }
+  };
+});
+ctl1.style.display='none';
+ctl2.style.display='none';
+ctl3.style.display='none';
+ctl4.style.display='none';
+ctl6.style.display='none';
+resetMesh2();
+runTab5();
+</script>
+
+<script>
+/* ============================================================
+   i18n: translate by text-content (robust to innerHTML normalization).
+   PAIRS = [zhHTML, enHTML]; keys derived from stripped textContent.
+   A MutationObserver re-translates dynamic captions on the fly.
+   ============================================================ */
+(function(){
+  const PAIRS = [
+    // header
+    ['HiCache × Pipeline Parallel：树一致性 & 防死锁','HiCache × Pipeline Parallel: Tree Consistency & Deadlock Avoidance'],
+    ['拓扑 <b>PP=3 × TP=8 = 24 ranks</b> · 行=TP 组、列=PP 组 · MIN all-reduce 保证 radix tree 一致 · 2 套 gloo 组避免后台 collective 死锁',
+     'Topology <b>PP=3 × TP=8 = 24 ranks</b> · rows = TP groups, cols = PP groups · MIN all-reduce keeps radix trees identical · 2 gloo group-sets avoid background-collective deadlock'],
+    // tabs
+    ['① 两请求全流程（L3 命中/未命中 · host tree 一致）','① Two-Request Lifecycle (L3 miss/hit · host-tree consistency)'],
+    ['② 树一致性（自动播放）','② Tree Consistency (auto-play)'],
+    ['③ 为什么 2 个组不死锁','③ Why 2 Groups Avoid Deadlock'],
+    ['④ 异步时间差 × MIN 统一步调','④ Async Skew × MIN Lockstep'],
+    ['⑤ 线程关系 & 树一致性','⑤ Thread Relationships & Consistency'],
+    ['⑥ PrefetchAck 对齐 & 防 hang','⑥ PrefetchAck Alignment & Anti-Hang'],
+    // tab6 note / legend / barrier label / scenario captions / banners
+    ['每个 <b>storage batch</b> 在 <code>_page_transfer</code> 里恒产 <b>1 个 PrefetchAck</b>；<code>prefetch_sync_thread</code> 对<strong>每个 ack</strong> 在组2 做一次 <code>all_reduce(MIN)</code>。所以 <b>ack 数 = batch 数 = 组2 collective 次数</b>，必须逐 rank 相等。',
+     'Each <b>storage batch</b> in <code>_page_transfer</code> always emits <b>exactly 1 PrefetchAck</b>; <code>prefetch_sync_thread</code> does one <code>all_reduce(MIN)</code> on set 2 <strong>per ack</strong>. So <b>#acks = #batches = #set-2 collectives</b>, and it must be equal on every rank.'],
+    ['<span class="sw" style="background:var(--blue)"></span>ack 已产出（参与本轮 reduce）','<span class="sw" style="background:var(--blue)"></span>ack emitted (joins this reduce)'],
+    ['<span class="sw" style="background:var(--green)"></span>barrier 凑齐 3/3 → 通过','<span class="sw" style="background:var(--green)"></span>barrier reaches 3/3 → pass'],
+    ['<span class="sw" style="background:var(--amber)"></span>已到达，等待缺席方','<span class="sw" style="background:var(--amber)"></span>arrived, waiting for the absent rank'],
+    ['<span class="sw" style="background:var(--red)"></span>缺失 ack → 永远等不到','<span class="sw" style="background:var(--red)"></span>missing ack → never arrives'],
+    ['◆ <code>all_reduce(MIN)</code> @ 组2（prefetch_completion_sync_groups）· 每个 ack 一次 barrier',
+     '◆ <code>all_reduce(MIN)</code> @ set 2 (prefetch_completion_sync_groups) · one barrier per ack'],
+    ['选择场景：<b>每 batch 恒 1 ack</b> → 次数对齐、安全；<b>出错就 break</b> → ack 缺一个 → 组2 reduce 错位 → hang。',
+     'Pick a scenario: <b>one ack per batch</b> → counts aligned, safe; <b>break on error</b> → one ack missing → set-2 reduces misalign → hang.'],
+    ['<b style="color:var(--green)">正确</b>：即便某 batch 出错，<code>_page_transfer</code> 也<strong>继续循环、照常产 ack</strong> → 三个 rank 都产 3 个 ack。',
+     '<b style="color:var(--green)">Correct</b>: even if a batch errors, <code>_page_transfer</code> <strong>keeps looping and still emits the ack</strong> → all three ranks emit 3 acks.'],
+    ['<b style="color:var(--red)">错误（反面教材）</b>：rank2 在 batch2 出错就 <code>break</code> → 只产 2 个 ack，比别人少一个。',
+     '<b style="color:var(--red)">Wrong (anti-pattern)</b>: rank2 hits an error at batch2 and <code>break</code>s → emits only 2 acks, one fewer than the others.'],
+    ['▶ 正确（每 batch 恒 1 ack）','▶ Correct (one ack per batch)'],
+    ['▶ 错误（出错 break → ack 缺失）','▶ Wrong (break on error → missing ack)'],
+    ['每个 batch 恒产 1 个 ack（出错也产）→ <b>ack 数逐 rank 相等</b> → 组2 的 collective 一一对应 → 不会 hang。',
+     'Each batch always emits one ack (even on error) → <b>ack counts equal across ranks</b> → set-2 collectives match one-to-one → no hang.'],
+    ['✅ 安全：每 rank 都做了 3 次 reduce，次数严格相等，全部对齐完成','✅ Safe: every rank did 3 reduces, counts strictly equal, all aligned and complete'],
+    ['💥 HANG：组2 reduce 次数不一致（3/3/2）→ collective 永不返回','💥 HANG: set-2 reduce counts differ (3/3/2) → the collective never returns'],
+    // tab5 legend chips
+    ['<span class="sw" style="background:var(--blue)"></span>GPU 计算 / 插入中','<span class="sw" style="background:var(--blue)"></span>GPU compute / inserting'],
+    ['<span class="sw" style="background:var(--cyan)"></span>match 命中前缀','<span class="sw" style="background:var(--cyan)"></span>matched prefix'],
+    ['<span class="sw" style="background:var(--amber)"></span>各 rank 不一致（待 MIN 统一）','<span class="sw" style="background:var(--amber)"></span>diverged per rank (await MIN)'],
+    ['<span class="sw" style="background:var(--green)"></span>已提交 / 一致','<span class="sw" style="background:var(--green)"></span>committed / consistent'],
+    ['<span class="sw" style="background:var(--red)"></span>未命中 / 淘汰删除','<span class="sw" style="background:var(--red)"></span>miss / evicted'],
+    // tab5 static labels
+    ['<b>L3 持久化存储</b>（storage backend，3 个 rank 共享视图）',
+     '<b>L3 persistent storage</b> (storage backend, shared view across 3 ranks)'],
+    ['GPU 计算','GPU compute'],
+    ['◆ MIN 组1 · prefetch_hits_sync_groups · storage_hit_count','◆ MIN set 1 · prefetch_hits_sync_groups · storage_hit_count'],
+    ['◆ MIN 组2 · prefetch_completion_sync_groups · completed_tokens','◆ MIN set 2 · prefetch_completion_sync_groups · completed_tokens'],
+    // tab5 consistency flags
+    ['✓ 3 棵 host tree 同步插入 4 个 page（一致）','✓ all 3 host trees insert 4 pages in sync (consistent)'],
+    ['✓ 3 棵 host tree 同步删除（delete 一致）','✓ all 3 host trees delete in sync (consistent)'],
+    ['⚠ 查询长度不一致（4/3/4）→ 若各自建树，host tree 会发散','⚠ query lengths differ (4/3/4) → building trees independently diverges them'],
+    ['✓ 抓取范围统一 = 3 → match_prefix 逐 rank 一致','✓ fetch range unified = 3 → match_prefix identical per rank'],
+    ['⚠ 实际落盘不一致（3/3/2）→ 若各自插入，host tree 会发散','⚠ actual loads differ (3/3/2) → inserting independently diverges trees'],
+    ['✓ 插入长度统一 = 2 → 3 棵 host tree 完全一致','✓ insert length unified = 2 → all 3 host trees identical'],
+    // tab5 step captions
+    ['场景 <b>PP=3 × TP=4</b>：每个 PP rank 维护一棵 <b>L2 host radix tree</b>，共享底层 <b>L3 持久化存储</b>。跟踪两个请求，看 host tree 如何保持一致。',
+     'Scenario <b>PP=3 × TP=4</b>: each PP rank keeps an <b>L2 host radix tree</b> over a shared <b>L3 persistent storage</b>. We follow two requests and see how the trees stay consistent.'],
+    ['① <b>请求 A</b> 到达（需要 4 个 page 的前缀），3 个 PP rank 同时处理。',
+     '① <b>Request A</b> arrives (needs a 4-page prefix); all 3 PP ranks process it together.'],
+    ['① 查 L2 host tree → <b style="color:var(--red)">miss</b>；查 L3 → <b style="color:var(--red)">miss</b>（存储为空）。',
+     '① Query L2 host tree → <b style="color:var(--red)">miss</b>; query L3 → <b style="color:var(--red)">miss</b> (storage empty).'],
+    ['① 回退到 <b>GPU 前向计算</b>，生成这 4 个 page 的 KV。',
+     '① Fall back to <b>GPU forward compute</b> to produce the KV for these 4 pages.'],
+    ['① 计算结果写入 <b>L2 host radix tree</b> → 3 个 rank <code>insert</code> <strong style="color:var(--green)">相同</strong>的前缀 p0–p3。',
+     '① Results are written into the <b>L2 host radix tree</b> → all 3 ranks <code>insert</code> the <strong style="color:var(--green)">same</strong> prefix p0–p3.'],
+    ['① backup 线程把 L2 → <b>L3</b> 持久化（<code>write_backup</code> / <code>page_set</code>）。',
+     '① The backup thread persists L2 → <b>L3</b> (<code>write_backup</code> / <code>page_set</code>).'],
+    ['② host 内存压力 → L2 触发<strong style="color:var(--red)">淘汰</strong>（<code>evict_host</code>）。3 棵 host tree <b>完全一致</b> → 淘汰命中<strong>同一批节点</strong>；L3 仍保留。',
+     '② Host-memory pressure → L2 <strong style="color:var(--red)">eviction</strong> (<code>evict_host</code>). The 3 host trees are <b>identical</b> → eviction hits the <strong>same nodes</strong>; L3 keeps them.'],
+    ['③ <b>请求 B</b> 到达（复用 A 的前缀）。L2 host tree 已空 → <b style="color:var(--red)">L2 miss</b>，转向 L3。',
+     '③ <b>Request B</b> arrives (reuses A\u2019s prefix). The L2 host tree is empty → <b style="color:var(--red)">L2 miss</b>, fall through to L3.'],
+    ['③ <b>prefetch_thread</b> 各 rank 向 L3 查命中页数 → 结果可能<strong style="color:var(--amber)">不同</strong>（host 视图/内存差异）：4 / 3 / 4。',
+     '③ <b>prefetch_thread</b> on each rank queries L3 hit pages → results may <strong style="color:var(--amber)">differ</strong> (host view / memory): 4 / 3 / 4.'],
+    ['◆ <b>第一个 MIN</b> @ <code>prefetch_hits_sync_groups</code>（组1，gloo/CPU，含 TP环+PP环）：<code>all_reduce(MIN)</code> 统一查询长度 = <b>3</b>。',
+     '◆ <b>First MIN</b> @ <code>prefetch_hits_sync_groups</code> (set 1, gloo/CPU, TP+PP rings): <code>all_reduce(MIN)</code> unifies the query length = <b>3</b>.'],
+    ['③ <b>prefetch_io_aux_thread</b> 逐 batch 把 page 从 L3 拉回 L2（<code>_page_transfer</code>），每 batch 产 1 个 PrefetchAck。',
+     '③ <b>prefetch_io_aux_thread</b> pulls pages L3→L2 batch by batch (<code>_page_transfer</code>), one PrefetchAck per batch.'],
+    ['③ 逐页加载<strong style="color:var(--amber)">部分失败</strong>：rank2 第 3 页 <code>page_get</code> 未成功 → completed_tokens = 3 / 3 / 2。',
+     '③ Per-page load <strong style="color:var(--amber)">partially fails</strong>: rank2\u2019s 3rd page <code>page_get</code> fails → completed_tokens = 3 / 3 / 2.'],
+    ['◆ <b>第二个 MIN</b> @ <code>prefetch_completion_sync_groups</code>（组2，<b>独立 communicator</b>）：<code>all_reduce(MIN)</code> 统一 completed_tokens = <b>2</b>。',
+     '◆ <b>Second MIN</b> @ <code>prefetch_completion_sync_groups</code> (set 2, <b>independent communicator</b>): <code>all_reduce(MIN)</code> unifies completed_tokens = <b>2</b>.'],
+    ['③ 各 rank 只把统一的 <b>2 个 page</b> 插入 L2 host tree（<code>_insert_helper_host</code>）→ 3 棵 host tree <strong style="color:var(--green)">再次完全一致</strong>。',
+     '③ Each rank inserts only the unified <b>2 pages</b> into its L2 host tree (<code>_insert_helper_host</code>) → all 3 host trees are <strong style="color:var(--green)">identical again</strong>.'],
+    ['✅ 两套<strong>独立 gloo 组</strong>（组1 命中数、组2 完成数）+ 每 batch 恒 1 个 ack → 各 rank 对 host tree 的<strong>插入/删除完全一致</strong> → <b style="color:var(--green)">host radix tree 始终一致，后台 collective 不会死锁</b>。',
+     '✅ Two <strong>independent gloo group-sets</strong> (set 1 hit count, set 2 completed tokens) + exactly one ack per batch → every rank\u2019s <strong>inserts/deletes are identical</strong> → <b style="color:var(--green)">host radix trees stay consistent and background collectives never deadlock</b>.'],
+    // buttons
+    ['⏸ 暂停','⏸ Pause'],['▶ 播放','▶ Play'],['⟲ 重播','⟲ Replay'],['重置','Reset'],
+    ['▶ 1 套组（死锁）','▶ 1 group set (deadlock)'],['▶ 2 套组（安全）','▶ 2 group sets (safe)'],
+    // tab1 legend + tree title + init caption
+    ['<span class="sw" style="background:var(--amber)"></span>命中数被截断（不一致）','<span class="sw" style="background:var(--amber)"></span>hit count truncated (inconsistent)'],
+    ['<span class="sw" style="background:var(--blue)"></span>TP 组内 MIN 后','<span class="sw" style="background:var(--blue)"></span>after MIN within TP group'],
+    ['<span class="sw" style="background:var(--green)"></span>PP 组内 MIN 后（全局一致）','<span class="sw" style="background:var(--green)"></span>after MIN within PP group (global)'],
+    ['所有 24 个 rank 共享同一棵 radix tree','all 24 ranks share one radix tree'],
+    ['自动播放中…','auto-playing…'],
+    // tab1 captions
+    ['拓扑 <b>PP=3 × TP=8 = 24 个 rank</b>：每个 PP stage 下挂 8 个 TP rank。',
+     'Topology <b>PP=3 × TP=8 = 24 ranks</b>: each PP stage holds 8 TP ranks.'],
+    ['① 各 rank <span class="k">独立</span>向 L3 查询前缀命中。<b style="color:var(--amber)">注意 r10、r15 因 host 内存压力被截断</b>（6 / 7 页）。',
+     '① Each rank <span class="k">independently</span> queries L3 for prefix hits. <b style="color:var(--amber)">Note r10 & r15 are truncated by host-memory pressure</b> (6 / 7 pages).'],
+    ['② 若各 rank 按自己的命中数建 radix tree → 树高不一致 → 后续 PP 集合通信 <b style="color:var(--red)">shape mismatch → crash</b>。',
+     '② If each rank builds its radix tree from its own hit count → tree heights differ → next PP collective <b style="color:var(--red)">shape mismatch → crash</b>.'],
+    ['③ 第一步：在 <span class="k">TP 组（每一行 8 个 rank）</span>内 <code>all_reduce(MIN)</code>。',
+     '③ Step 1: <code>all_reduce(MIN)</code> within each <span class="k">TP group (a row of 8 ranks)</span>.'],
+    ['③ TP 组归约后：<b>每一行变得一致</b>（PP0=8, PP1=6, PP2=8 = 各行最小值）。',
+     '③ After TP reduce: <b>each row is uniform</b> (PP0=8, PP1=6, PP2=8 = per-row min).'],
+    ['④ 第二步：在 <span class="k">PP 组（每一列 3 个 rank）</span>内 <code>all_reduce(MIN)</code> → 收敛到全局最小值。',
+     '④ Step 2: <code>all_reduce(MIN)</code> within each <span class="k">PP group (a column of 3 ranks)</span> → converge to the global minimum.'],
+    ['④ PP 组归约后：<b style="color:var(--green)">全部 24 个 rank 命中数 = 6</b>（最长公共前缀）。',
+     '④ After PP reduce: <b style="color:var(--green)">all 24 ranks hit = 6</b> (longest common prefix).'],
+    ['⑤ 所有 rank 都只 prefetch / 建树到 6 → <span style="color:var(--green)">24 个 rank 的 radix tree 完全一致 ✓</span>',
+     '⑤ Every rank prefetches / builds the tree only up to 6 → <span style="color:var(--green)">all 24 radix trees are identical ✓</span>'],
+    // tab2 legend + note + groups + init + captions + banners
+    ['<span class="sw" style="background:var(--purple)"></span><b>prefetch_thread</b>（独立后台线程）· reduce(storage_hit_count)','<span class="sw" style="background:var(--purple)"></span><b>prefetch_thread</b> (independent background thread) · reduce(storage_hit_count)'],
+    ['<span class="sw" style="background:var(--cyan)"></span><b>prefetch_sync_thread</b>（独立后台线程）· reduce(completed_tokens)','<span class="sw" style="background:var(--cyan)"></span><b>prefetch_sync_thread</b> (independent background thread) · reduce(completed_tokens)'],
+    ['每个 cell = 1 个 rank，内含 2 个独立后台线程（小圆点 ●A ●B）。每一行是一个 <b>TP communicator</b>，每一列是一个 <b>PP communicator</b>。',
+     'Each cell = 1 rank, holding 2 independent background threads (dots ●A ●B). Each row is a <b>TP communicator</b>, each column a <b>PP communicator</b>.'],
+    ['<b>prefetch_hits_sync_groups</b><br>命中页数归约组（含 TP 环 + PP 环）<br><span style="font-size:11px">reduce(storage_hit_count)</span>',
+     '<b>prefetch_hits_sync_groups</b><br>hit-count reduce set (TP rings + PP rings)<br><span style="font-size:11px">reduce(storage_hit_count)</span>'],
+    ['<b>prefetch_completion_sync_groups</b><br>完成 token 归约组（含 TP 环 + PP 环）<br><span style="font-size:11px">reduce(completed_tokens)</span>',
+     '<b>prefetch_completion_sync_groups</b><br>completed-token reduce set (TP rings + PP rings)<br><span style="font-size:11px">reduce(completed_tokens)</span>'],
+    ['选择场景：用 <b>1 套组</b> 会死锁，用 <b>2 套组</b> 则安全。','Pick a scenario: <b>1 group set</b> deadlocks, <b>2 group sets</b> are safe.'],
+    ['只有 <b>1 套组</b>：prefetch_thread(A) 与 prefetch_sync_thread(B) 共用同一个 communicator 集。',
+     'Only <b>1 group set</b>: prefetch_thread(A) and prefetch_sync_thread(B) share the same communicator set.'],
+    ['两个后台线程<b>独立调度、顺序不定</b>：同一个 TP 环里，有的 rank 先发 A，有的先发 B。',
+     'The two background threads are <b>scheduled independently, order unpredictable</b>: within one TP ring some ranks post A first, others post B first.'],
+    ['同一个 communicator 上各 rank 提交的 collective <b style="color:var(--red)">不是同一个</b>（A 与 B 错位）→ rendezvous 永远配不上。',
+     'On the same communicator the collectives submitted by different ranks are <b style="color:var(--red)">not the same</b> (A vs B misaligned) → rendezvous never matches.'],
+    ['只要任一 communicator 上 A/B 交错，该环就死锁 → 全局 PP/TP 通信连环卡住。',
+     'If A/B interleave on any communicator, that ring deadlocks → all PP/TP communication hangs in a chain.'],
+    ['💥 DEADLOCK — 整个 24-rank job 卡死','💥 DEADLOCK — the whole 24-rank job hangs'],
+    ['用 <b>2 套独立组</b>：<b style="color:var(--purple)">A 永远走 prefetch_hits_sync_groups</b>，<b style="color:var(--cyan)">B 永远走 prefetch_completion_sync_groups</b>。',
+     'With <b>2 independent group sets</b>: <b style="color:var(--purple)">A always uses prefetch_hits_sync_groups</b>, <b style="color:var(--cyan)">B always uses prefetch_completion_sync_groups</b>.'],
+    ['第一波：所有 rank 的 <b>prefetch_thread</b> 只在 <code>prefetch_hits_sync_groups</code> 上提交 A → 序列一致。',
+     'Wave 1: every rank\u2019s <b>prefetch_thread</b> posts A only on <code>prefetch_hits_sync_groups</code> → consistent order.'],
+    ['✓ TP 环 + PP 环上 A 全部到齐 → 第一波归约完成。','✓ A arrives on every TP ring + PP ring → wave 1 reduce done.'],
+    ['第二波：所有 rank 的 <b>prefetch_sync_thread</b> 只在 <code>prefetch_completion_sync_groups</code> 上提交 B → 序列一致。',
+     'Wave 2: every rank\u2019s <b>prefetch_sync_thread</b> posts B only on <code>prefetch_completion_sync_groups</code> → consistent order.'],
+    ['每个 communicator 上的 collective 序列在所有 rank <b style="color:var(--green)">完全一致</b>（A→组1、B→组2 不交叉）→ 不会死锁。',
+     'The collective sequence on each communicator is <b style="color:var(--green)">identical across ranks</b> (A→set1, B→set2, never crossing) → no deadlock.'],
+    ['✅ 安全 — 24 个 rank 全部对齐完成','✅ Safe — all 24 ranks aligned and complete'],
+    // tab3 titles / conduits / flow-note / lane hint / captions / formers
+    ['③ 主 PP 流水线执行<strong>时序</strong> <span class="tag gpu">NCCL · GPU</span> <span style="color:var(--muted);font-size:11px;">时序连续、错峰流动，<strong style="color:var(--green)">不被后台 prefetch 同步打断</strong></span>',
+     '③ Main PP pipeline execution <strong>timing</strong> <span class="tag gpu">NCCL · GPU</span> <span style="color:var(--muted);font-size:11px;">continuous, staggered flow, <strong style="color:var(--green)">never interrupted by background prefetch sync</strong></span>'],
+    ['↑ 流水线跑的正是②组好的 <strong>mb0→mb3</strong>，沿 stage0→1→2 错峰对角推进',
+     '↑ The pipeline runs exactly the <strong>mb0→mb3</strong> composed in ②, advancing diagonally stage0→1→2'],
+    ['▲ 组好的 <b>batch &amp; micro-batch 顺序</b> 喂给流水线（内容）','▲ The composed <b>batch &amp; micro-batch order</b> feeds the pipeline (content)'],
+    ['② 三个 PP rank 用<strong>同一个 storage hit</strong> 组 batch（内容必须逐 rank 一致）',
+     '② The three PP ranks compose the batch from <strong>the same storage hit</strong> (content must match per rank)'],
+    ['▲ <code>all_reduce(MIN)</code> 输出统一值 <b>6</b> → 决定 batch size','▲ <code>all_reduce(MIN)</code> outputs the unified value <b>6</b> → determines batch size'],
+    ['① 异步 prefetch 查询 → <code>all_reduce(MIN)</code> <span class="tag cpu">gloo · CPU 后台线程</span>',
+     '① Async prefetch query → <code>all_reduce(MIN)</code> <span class="tag cpu">gloo · CPU background thread</span>'],
+    ['（等待②组好的 batch…）','(waiting for batch from ②…)'],
+    ['① 三个 PP rank 的 prefetch 查询<strong>异步发起</strong>（到达时刻不同）。','① The three PP ranks issue prefetch queries <strong>asynchronously</strong> (different arrival times).'],
+    ['① 先到的 rank 在 <span class="k">gloo CPU 后台线程</span>上<b style="color:var(--amber)">等待对齐</b>（不占 GPU）。',
+     '① Earlier ranks <b style="color:var(--amber)">wait to align</b> on the <span class="k">gloo CPU background thread</span> (no GPU use).'],
+    ['① <b style="color:var(--amber)">pp2 最慢</b>到达 → <code>all_reduce(MIN)</code> 把 8/6/7 <strong style="color:var(--green)">统一成 6</strong>。',
+     '① <b style="color:var(--amber)">pp2 is slowest</b> to arrive → <code>all_reduce(MIN)</code> unifies 8/6/7 <strong style="color:var(--green)">into 6</strong>.'],
+    ['② 统一后的 <b>storage hit = 6</b> 下发给各 rank 调度器 → 决定<strong>已缓存前缀长度 / batch size / micro-batch 顺序</strong>（mb0→mb3）。',
+     '② The unified <b>storage hit = 6</b> goes to each rank\u2019s scheduler → determines <strong>cached prefix length / batch size / micro-batch order</strong> (mb0→mb3).'],
+    ['③ 三个 rank 因拿到<strong style="color:var(--green)">同一个 6</strong> 而组出<strong style="color:var(--green)">完全一致的 batch 与 mb 顺序</strong>，喂给 PP 流水线；执行时序连续不被打断。<br><span style="color:var(--red)">⚠ 若 storage hit 不统一 → batch/mb 顺序逐 rank 发散 → PP 调度错位、卡死。</span>',
+     '③ Because all three ranks get <strong style="color:var(--green)">the same 6</strong>, they compose <strong style="color:var(--green)">identical batches and mb order</strong> fed to the PP pipeline; timing stays continuous.<br><span style="color:var(--red)">⚠ If storage hit weren\u2019t unified → batch/mb order diverges per rank → PP scheduling mismatch & hang.</span>'],
+    // formers
+    ['调度器 · PP rank 0','Scheduler · PP rank 0'],['调度器 · PP rank 1','Scheduler · PP rank 1'],['调度器 · PP rank 2','Scheduler · PP rank 2'],
+    ['已缓存前缀 storage hit = ','cached prefix storage hit = '],
+    [' 页 → 决定 batch 组成',' pages → determines batch'],
+    ['✓ batch & mb 顺序一致','✓ identical batch & mb order'],
+    // mesh labels
+    ['<b>PP stage 0</b><br>(TP 组)','<b>PP stage 0</b><br>(TP group)'],
+    ['<b>PP stage 1</b><br>(TP 组)','<b>PP stage 1</b><br>(TP group)'],
+    ['<b>PP stage 2</b><br>(TP 组)','<b>PP stage 2</b><br>(TP group)'],
+    ['PP 组(列)<br>每列跨 3 个 stage →','PP groups (cols)<br>each spans 3 stages →'],
+    // tab4 flow boxes
+    ['调度器 Scheduler <span class="pin">主线程</span>','Scheduler <span class="pin">main thread</span>'],
+    ['发起 prefetch 请求（writeback / load）','Issues prefetch requests (writeback / load)'],
+    ['▼ <b>prefetch_queue</b>（PrefetchOperation）','▼ <b>prefetch_queue</b> (PrefetchOperation)'],
+    ['① prefetch_thread <span class="pin">storage-hit 线程</span>','① prefetch_thread <span class="pin">storage-hit thread</span>'],
+    ['<code>_storage_hit_query()</code> 查询 L3 命中页数；命中足够→放 prefetch_buffer，不足→prefetch_revoke_queue',
+     '<code>_storage_hit_query()</code> queries L3 hit pages; enough hits → prefetch_buffer, too few → prefetch_revoke_queue'],
+    ['◆ all_reduce(MIN) storage_hit_count <small>@ prefetch_hits_sync_groups（组1，gloo/CPU，含 TP 环 + PP 环）</small>',
+     '◆ all_reduce(MIN) storage_hit_count <small>@ prefetch_hits_sync_groups (set 1, gloo/CPU, TP rings + PP rings)</small>'],
+    ['▼ <b>prefetch_buffer</b>','▼ <b>prefetch_buffer</b>'],
+    ['② prefetch_io_aux_thread <span class="pin">IO 加载线程</span>','② prefetch_io_aux_thread <span class="pin">IO load thread</span>'],
+    ['<code>_page_transfer()</code> 逐 batch 把页从 L3 读入 host；累加 <b>completed_tokens</b>；<b>每个 storage batch 产生 1 个 PrefetchAck</b>（出错也照常产生）',
+     '<code>_page_transfer()</code> loads pages L3→host batch by batch; accumulates <b>completed_tokens</b>; <b>each storage batch emits exactly 1 PrefetchAck</b> (even on error)'],
+    ['▼ <b>prefetch_sync_queue</b>（PrefetchAck）','▼ <b>prefetch_sync_queue</b> (PrefetchAck)'],
+    ['③ prefetch_sync_thread <span class="pin">completion-token 线程</span>','③ prefetch_sync_thread <span class="pin">completion-token thread</span>'],
+    ['对每个 ack 的 <b>completed_tokens</b> 做归约','Reduces <b>completed_tokens</b> of every ack'],
+    ['◆ all_reduce(MIN) completed_tokens <small>@ prefetch_completion_sync_groups（组2，gloo/CPU，含 TP 环 + PP 环）</small>',
+     '◆ all_reduce(MIN) completed_tokens <small>@ prefetch_completion_sync_groups (set 2, gloo/CPU, TP rings + PP rings)</small>'],
+    ['▼ <b>ack_prefetch_queue</b>','▼ <b>ack_prefetch_queue</b>'],
+    ['调度器写入 host radix tree','Scheduler inserts into host radix tree'],
+    ['只插入 <b>completed_tokens</b> 长度的前缀 → <code>_insert_helper_host()</code>','Inserts only the <b>completed_tokens</b>-long prefix → <code>_insert_helper_host()</code>'],
+    ['为什么 MIN(storage_hit) 一致？','Why does MIN(storage_hit) ensure consistency?'],
+    ['各 rank 命中可能不同（host 内存截断、L3 视图差异）。MIN 取<b>最长公共可命中前缀</b> → 所有 rank <b>抓取范围一致</b>，不会各抓不同长度。',
+     'Hits may differ per rank (host-mem truncation, L3 view differences). MIN takes the <b>longest common hittable prefix</b> → every rank <b>fetches the same range</b>, never different lengths.'],
+    ['为什么 MIN(completed_tokens) 一致？','Why does MIN(completed_tokens) ensure consistency?'],
+    ['即便抓取范围一致，实际逐页加载仍可能<b>部分失败</b>（<code>page_get</code> 返回 n≠batch）。MIN 只提交<b>所有 rank 都成功落盘的最长公共前缀</b> → 写入 host tree 的长度逐 rank 相同。',
+     'Even with the same fetch range, per-page loads can <b>partially fail</b> (<code>page_get</code> returns n≠batch). MIN commits only the <b>longest common prefix every rank loaded successfully</b> → identical insert length per rank.'],
+    ['为什么不会 hang？','Why no hang?'],
+    ['每个 storage batch <b>都产生且仅产生一个 PrefetchAck</b>（即使出错也照常产生）→ 每个 rank 参与的 reduce <b>次数严格相等</b>，collective 一一对齐。两个 MIN 一起保证：<b>插入 host tree 的前缀逐 rank 完全相同 → 树一致</b>。',
+     'Each storage batch <b>emits exactly one PrefetchAck</b> (even on error) → every rank joins the <b>same number of reduces</b>, collectives align one-to-one. The two MINs together guarantee: <b>the prefix inserted into the host tree is identical per rank → trees are consistent</b>.'],
+    ['两个 MIN 同步点（组1 命中数、组2 完成数）+ 每 batch 恒定 1 个 ack，共同保证 PP 各 rank 的 host radix tree 严格一致。',
+     'Two MIN sync points (set 1 = hit count, set 2 = completed tokens) + exactly one ack per batch together keep every PP rank\u2019s host radix tree strictly identical.'],
+    // tab4 animated step captions
+    ['沿数据流向下逐步点亮：两个 MIN 同步点 + 每 batch 恒定 1 个 ack。',
+     'Light up step by step along the data flow: two MIN sync points + exactly one ack per batch.'],
+    ['调度器主线程把 prefetch 请求（writeback / load）放入队列，触发后台流水线。',
+     'The scheduler main thread enqueues a prefetch request (writeback / load), kicking off the background pipeline.'],
+    ['<b>PrefetchOperation</b> 入队 <code>prefetch_queue</code>，交给后台线程处理。',
+     'A <b>PrefetchOperation</b> enters <code>prefetch_queue</code>, handed to the background threads.'],
+    ['① <b>prefetch_thread</b> 调 <code>_storage_hit_query()</code> 查询 L3 命中页数（各 rank 可能不同）。',
+     '① <b>prefetch_thread</b> calls <code>_storage_hit_query()</code> to query L3 hit pages (may differ per rank).'],
+    ['◆ <b style="color:var(--amber)">第一个 MIN</b>：在 <code>prefetch_hits_sync_groups</code>（组1）对 storage_hit_count 取最小 → <b>抓取范围逐 rank 一致</b>。',
+     '◆ <b style="color:var(--amber)">First MIN</b>: take the min of storage_hit_count on <code>prefetch_hits_sync_groups</code> (set 1) → <b>the fetch range is identical per rank</b>.'],
+    ['命中足够的请求落入 <code>prefetch_buffer</code>，进入实际 IO 加载。',
+     'Requests with enough hits drop into <code>prefetch_buffer</code> for the actual IO load.'],
+    ['② <b>prefetch_io_aux_thread</b> 用 <code>_page_transfer()</code> 逐 batch 把页 L3→host；<b>每个 batch 恒产生 1 个 PrefetchAck</b>（出错也产生）。',
+     '② <b>prefetch_io_aux_thread</b> uses <code>_page_transfer()</code> to move pages L3→host batch by batch; <b>each batch always emits exactly one PrefetchAck</b> (even on error).'],
+    ['每个 batch 的 <b>PrefetchAck</b> 入队 <code>prefetch_sync_queue</code>。',
+     'Each batch\u2019s <b>PrefetchAck</b> enters <code>prefetch_sync_queue</code>.'],
+    ['③ <b>prefetch_sync_thread</b> 对每个 ack 的 <b>completed_tokens</b> 做归约。',
+     '③ <b>prefetch_sync_thread</b> reduces the <b>completed_tokens</b> of every ack.'],
+    ['◆ <b style="color:var(--green)">第二个 MIN</b>：在 <code>prefetch_completion_sync_groups</code>（组2）对 completed_tokens 取最小 → <b>真正落盘前缀逐 rank 一致</b>。',
+     '◆ <b style="color:var(--green)">Second MIN</b>: take the min of completed_tokens on <code>prefetch_completion_sync_groups</code> (set 2) → <b>the actually-loaded prefix is identical per rank</b>.'],
+    ['统一后的结果入队 <code>ack_prefetch_queue</code> 回到调度器。',
+     'The unified result enters <code>ack_prefetch_queue</code> back to the scheduler.'],
+    ['调度器只插入 <b>completed_tokens</b> 长度的前缀 → <code>_insert_helper_host()</code>。每 batch 恒 1 个 ack，<b>reduce 次数严格相等 → 不会 hang</b>。',
+     'The scheduler inserts only the <b>completed_tokens</b>-long prefix → <code>_insert_helper_host()</code>. One ack per batch means <b>equal reduce counts → no hang</b>.'],
+    ['✅ 闭环：两个 MIN（组1 命中数 + 组2 完成数）+ 每 batch 1 个 ack → <b style="color:var(--green)">PP 各 rank 的 host radix tree 严格一致</b>。',
+     '✅ Closed loop: two MINs (set 1 hit count + set 2 completed tokens) + one ack per batch → <b style="color:var(--green)">every PP rank\u2019s host radix tree is strictly identical</b>.'],
+  ];
+
+  const SEL = ['header h1','header p','.tab','.tree-title','.caption','.banner','.legend .chip',
+    '#scene2 .note','.grp','.t3-title','.clabel','.flow-note','.lane-hint','.ctl',
+    '.pp-label','.pp-foot .lab','.former h5','.former .hitbox .ht1','.former .hitbox .ht2','.former .chk',
+    '.tbox .tname','.tbox .tdesc','.tarrow','.minnode','.whycard h4','.whycard p',
+    '.gpu-badge span','.l3lab','.syncbadge','.consist-flag','#scene6 .note','.barlabel'].join(',');
+
+  const tmp=document.createElement('div');
+  const strip=h=>{ tmp.innerHTML=h; return tmp.textContent.replace(/\s+/g,' ').trim(); };
+  const EN={}, ZH={};
+  PAIRS.forEach(([zh,en])=>{ EN[strip(zh)]=en; ZH[strip(en)]=zh; });
+
+  let LANG='zh';
+  // runtime helper for dynamic strings that contain interpolated values
+  // (cannot be matched by the static dictionary). Reads the live LANG.
+  window.TR=(zh,en)=> LANG==='en' ? en : zh;
+  let mo=null;
+  let suppress=false;   // re-entrancy guard: ignore mutations we cause ourselves
+  function translateEl(el){
+    const k=strip(el.innerHTML);
+    const next = LANG==='en' ? EN[k] : ZH[k];
+    // only write when there is a real change, otherwise we churn the DOM
+    if(next!==undefined && next!==el.innerHTML) el.innerHTML=next;
+  }
+  function translateAll(){
+    suppress=true;
+    document.querySelectorAll(SEL).forEach(translateEl);
+    if(mo) mo.takeRecords();   // drop the records our own writes just generated
+    suppress=false;
+  }
+
+  window.toggleLang=function(){
+    LANG = LANG==='zh' ? 'en' : 'zh';
+    document.getElementById('langBtn').textContent = LANG==='zh' ? 'EN' : '中文';
+    document.documentElement.lang = LANG==='zh' ? 'zh-CN' : 'en';
+    translateAll();
+  };
+
+  // keep dynamic captions translated as JS rewrites them
+  mo=new MutationObserver(muts=>{
+    if(suppress) return;       // skip the mutations our own translations produced
+    suppress=true;
+    muts.forEach(m=>{
+      const tgt = m.target.nodeType===1 ? m.target : m.target.parentElement;
+      if(!tgt) return;
+      const c = tgt.closest && tgt.closest(SEL);
+      if(c) translateEl(c);
+    });
+    mo.takeRecords();
+    suppress=false;
+  });
+  mo.observe(document.body,{subtree:true,childList:true,characterData:true});
+})();
+</script>
+</body>
+</html>
+<style>#langBtn,header,.tabs{display:none!important;}body{background:#0e1117;}.wrap{padding-top:10px;}</style>
+<script>
+(function(){
+  // English-only, single-tab embed: reuse all original JS
+  try{ if(window.toggleLang) toggleLang(); }catch(e){}   // zh -> en
+  var TAB="consistency";
+  var btn=document.querySelector('.tab[data-tab="'+TAB+'"]');
+  if(btn){ btn.click(); }
+})();
+</script>
diff --git a/public/images/blog/pp_hicache_consistency/hicache_pp_animation_en_deadlock.html b/public/images/blog/pp_hicache_consistency/hicache_pp_animation_en_deadlock.html
new file mode 100644
index 000000000..c6a939bf7
--- /dev/null
+++ b/public/images/blog/pp_hicache_consistency/hicache_pp_animation_en_deadlock.html
@@ -0,0 +1,1559 @@
+<!DOCTYPE html>
+<html lang="zh-CN">
+<head>
+<meta charset="UTF-8" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<title>HiCache × PP=3 · TP=8：树一致性 & 防死锁 动画</title>
+<style>
+  :root{
+    --bg:#0e1117; --panel:#161b22; --panel2:#1c2330; --line:#30363d;
+    --text:#e6edf3; --muted:#8b949e;
+    --blue:#58a6ff; --green:#3fb950; --red:#f85149; --amber:#d29922;
+    --purple:#bc8cff; --cyan:#56d4dd;
+  }
+  *{box-sizing:border-box;}
+  body{
+    margin:0; background:radial-gradient(1200px 600px at 50% -10%, #18202c, var(--bg));
+    color:var(--text); font-family:-apple-system,BlinkMacSystemFont,"Segoe UI","PingFang SC","Microsoft YaHei",sans-serif;
+    line-height:1.5;
+  }
+  #langBtn{ position:fixed; top:14px; right:16px; z-index:50; background:var(--panel2); border:1px solid var(--blue);
+    color:var(--blue); padding:7px 14px; border-radius:999px; cursor:pointer; font-size:13px; font-weight:600; }
+  #langBtn:hover{ background:var(--blue); color:#04101f; }
+  header{ text-align:center; padding:20px 16px 4px; }
+  header h1{ margin:0 0 4px; font-size:21px; }
+  header p{ margin:0; color:var(--muted); font-size:13px; }
+  .tabs{ display:flex; gap:8px; justify-content:center; margin:16px auto 8px; flex-wrap:wrap; }
+  .tab{ background:var(--panel); border:1px solid var(--line); color:var(--text);
+    padding:9px 16px; border-radius:999px; cursor:pointer; font-size:14px; transition:all .15s; }
+  .tab.active{ background:var(--blue); color:#04101f; border-color:var(--blue); font-weight:600; }
+  .wrap{ max-width:1120px; margin:0 auto; padding:0 16px 60px; }
+  .scene{ background:var(--panel); border:1px solid var(--line); border-radius:14px; padding:18px; position:relative; }
+  .hidden{ display:none; }
+  .controls{ display:flex; gap:10px; justify-content:center; align-items:center; margin:14px 0 4px; flex-wrap:wrap; }
+  button.ctl{ background:var(--panel2); border:1px solid var(--line); color:var(--text);
+    padding:8px 16px; border-radius:8px; cursor:pointer; font-size:14px; }
+  button.ctl:hover{ border-color:var(--blue); }
+  button.ctl.primary{ background:var(--green); color:#04140a; border-color:var(--green); font-weight:600; }
+  button.ctl.alt{ background:var(--red); color:#1a0606; border-color:var(--red); font-weight:600; }
+  .caption{ text-align:center; min-height:44px; margin:10px auto 0; max-width:900px; font-size:15px; }
+  .caption .k{ color:var(--cyan); font-weight:600; }
+  code{ background:#0d1117; padding:1px 6px; border-radius:4px; border:1px solid var(--line); color:var(--cyan); font-size:12px; }
+  .legend{ display:flex; gap:18px; justify-content:center; font-size:12px; color:var(--muted); flex-wrap:wrap; margin-bottom:10px; }
+  .chip{ display:inline-flex; align-items:center; gap:6px; }
+  .sw{ width:14px; height:14px; border-radius:4px; display:inline-block; }
+
+  /* ---------- mesh ---------- */
+  .mesh-head{ display:flex; align-items:center; gap:8px; margin-left:96px; margin-bottom:4px; }
+  .tp-hdr{ flex:1; display:flex; gap:6px; }
+  .tp-hdr .th{ flex:1; text-align:center; font-size:11px; color:var(--muted); }
+  .pp-row{ display:flex; align-items:center; gap:8px; margin-bottom:6px; }
+  .pp-label{ width:88px; font-size:12px; color:var(--muted); text-align:right; line-height:1.2; }
+  .pp-label b{ color:var(--text); }
+  .row-cells{ flex:1; display:flex; gap:6px; border:2px solid transparent; border-radius:10px; padding:3px; transition:border-color .3s, box-shadow .3s; }
+  .row-cells.ring-a{ border-color:var(--purple); box-shadow:0 0 12px rgba(188,140,255,.25); }
+  .row-cells.ring-b{ border-color:var(--cyan); box-shadow:0 0 12px rgba(86,212,221,.25); }
+  .row-cells.ring-bad{ border-color:var(--red); box-shadow:0 0 12px rgba(248,81,73,.3); }
+  .cell{
+    flex:1; height:54px; border-radius:8px; background:#0d1117; border:1px solid var(--line);
+    display:flex; flex-direction:column; align-items:center; justify-content:center; gap:1px;
+    transition:background .3s, border-color .3s, transform .15s, box-shadow .15s; position:relative;
+  }
+  .cell .v{ font-size:18px; font-weight:800; color:var(--muted); transition:color .3s; }
+  .cell .v small{ font-size:9px; font-weight:500; }
+  .cell .rk{ font-size:9px; color:#5b6470; }
+  .cell.varied .v{ color:var(--amber); }
+  .cell.tpmin{ background:linear-gradient(180deg,#143055,#102844); border-color:var(--blue); }
+  .cell.tpmin .v{ color:#cfe6ff; }
+  .cell.gmin{ background:linear-gradient(180deg,#0f3a1d,#0e2c18); border-color:var(--green); }
+  .cell.gmin .v{ color:#c4f7d4; }
+  .cell.sweep{ transform:translateY(-3px); box-shadow:0 0 14px rgba(88,166,255,.55); border-color:var(--blue); }
+  .cell.bad{ background:linear-gradient(180deg,#3a1414,#2a1010); border-color:var(--red); }
+  .cell.bad .v{ color:#ffd4d0; }
+  .cell.dim{ opacity:.35; }
+  /* thread dots */
+  .tdots{ display:flex; gap:5px; margin-top:1px; }
+  .td{ width:9px; height:9px; border-radius:50%; border:1px solid var(--line); background:#0d1117; transition:all .25s; }
+  .td.a{ border-color:var(--purple); }
+  .td.b{ border-color:var(--cyan); }
+  .td.a.on{ background:var(--purple); box-shadow:0 0 8px var(--purple); }
+  .td.b.on{ background:var(--cyan); box-shadow:0 0 8px var(--cyan); }
+  .td.done{ background:var(--green); border-color:var(--green); box-shadow:0 0 6px var(--green); }
+  .td.dead{ background:var(--red); border-color:var(--red); box-shadow:0 0 6px var(--red); }
+
+  /* pp-group footer (columns) */
+  .pp-foot{ display:flex; align-items:center; gap:8px; margin-top:6px; }
+  .pp-foot .lab{ width:88px; font-size:11px; color:var(--muted); text-align:right; }
+  .pp-foot .cols{ flex:1; display:flex; gap:6px; padding:0 3px; }
+  .pp-foot .col{ flex:1; height:20px; border-radius:6px; border:1px dashed var(--line); font-size:9px;
+    color:#5b6470; display:flex; align-items:center; justify-content:center; transition:all .3s; }
+  .pp-foot .col.ring-a{ border-color:var(--purple); color:#e3d3ff; }
+  .pp-foot .col.ring-b{ border-color:var(--cyan); color:#cdf6fa; }
+  .pp-foot .col.ring-bad{ border-color:var(--red); color:#ffd4d0; }
+
+  /* shared tree (tab1) */
+  .tree-box{ margin-top:14px; display:flex; flex-direction:column; align-items:center; }
+  .tree-title{ font-size:12px; color:var(--muted); margin-bottom:6px; }
+  .tree{ display:flex; flex-direction:column; align-items:center; gap:4px; min-height:30px; }
+  .tnode{ width:220px; height:20px; border-radius:5px; background:#0d1117; border:1px solid var(--line);
+    display:flex; align-items:center; justify-content:center; font-size:11px; color:var(--muted);
+    opacity:0; transform:translateY(-6px); transition:opacity .25s, transform .25s; }
+  .tnode.show{ opacity:1; transform:none; background:linear-gradient(90deg,#0f3a1d,#196b32); border-color:var(--green); color:#c4f7d4; }
+
+  /* groups panel (tab2) */
+  .groups{ display:flex; gap:16px; justify-content:center; margin-top:12px; flex-wrap:wrap; }
+  .grp{ border:1px dashed var(--line); border-radius:10px; padding:8px 14px; font-size:12px; color:var(--muted);
+    min-width:230px; text-align:center; transition:all .3s; }
+  .grp b{ color:var(--text); }
+  .grp.g1.hot{ border-color:var(--purple); color:#e3d3ff; box-shadow:0 0 14px rgba(188,140,255,.25); }
+  .grp.g2.hot{ border-color:var(--cyan); color:#cdf6fa; box-shadow:0 0 14px rgba(86,212,221,.25); }
+  .banner{ text-align:center; font-weight:800; font-size:18px; min-height:24px; margin-top:8px; }
+  .banner.ok{ color:var(--green); } .banner.bad{ color:var(--red); }
+  .note{ font-size:12px; color:var(--muted); text-align:center; margin-top:4px; }
+
+  /* ---------- tab3: async skew x MIN ---------- */
+  .t3-section{ background:var(--panel2); border:1px solid var(--line); border-radius:12px; padding:10px 12px; margin-bottom:14px; }
+  .t3-title{ font-size:13px; margin-bottom:8px; display:flex; align-items:center; gap:8px; }
+  .t3-title .tag{ font-size:10px; padding:2px 8px; border-radius:999px; border:1px solid var(--line); color:var(--muted); }
+  .t3-title .tag.gpu{ border-color:var(--blue); color:#cfe6ff; }
+  .t3-title .tag.cpu{ border-color:var(--amber); color:#ffe2ab; }
+  /* top pipeline */
+  .pipe{ position:relative; }
+  .lane{ position:relative; height:34px; margin:6px 0; border-radius:8px; background:#0d1117;
+    border:1px solid var(--line); overflow:hidden; }
+  .lane .lname{ position:absolute; left:8px; top:50%; transform:translateY(-50%); font-size:11px; color:var(--muted); z-index:3; }
+  .mb{ position:absolute; top:6px; height:22px; width:52px; border-radius:6px; z-index:2;
+    display:flex; align-items:center; justify-content:center; font-size:10px; font-weight:700;
+    opacity:0; color:#c4f7d4; border:1px solid var(--green);
+    background:linear-gradient(180deg,#0f3a1d,#15311f); transition:opacity .18s; }
+  .lane-hint{ position:absolute; right:10px; top:50%; transform:translateY(-50%); font-size:10px; color:#46505e; z-index:1; }
+  .flow-note{ font-size:11px; color:var(--green); text-align:right; margin-top:2px; }
+
+  /* bottom sync area */
+  .sync-area{ position:relative; height:170px; border-radius:8px; background:#0d1117; border:1px solid var(--line); overflow:hidden; }
+  .barrier{ position:absolute; top:6px; bottom:6px; width:0; border-left:2px dashed var(--amber); z-index:2; }
+  .barrier .blabel{ position:absolute; top:-2px; left:8px; font-size:11px; color:var(--amber); white-space:nowrap; }
+  .barrier.fire{ border-left-color:var(--green); box-shadow:-2px 0 18px rgba(63,185,80,.5); }
+  .slane{ position:absolute; left:0; right:0; height:1px; border-top:1px dashed #1f2733; }
+  .slabel{ position:absolute; left:8px; font-size:11px; color:var(--muted); z-index:3; transform:translateY(-50%); }
+  .pkt{ position:absolute; height:30px; width:108px; border-radius:8px; z-index:3; transform:translateY(-50%);
+    display:flex; align-items:center; justify-content:center; gap:6px; font-size:11px; font-weight:700;
+    border:1px solid var(--line); background:#11161f; transition:background .25s, border-color .25s, box-shadow .25s; }
+  .pkt .hv{ font-size:14px; }
+  .pkt.travel{ border-color:var(--blue); color:#cfe6ff; }
+  .pkt.wait{ border-color:var(--amber); color:#ffe2ab; background:#1d1808; animation:pulse 1s infinite; }
+  .pkt.unified{ border-color:var(--green); color:#c4f7d4; background:linear-gradient(180deg,#0f3a1d,#0e2c18); box-shadow:0 0 12px rgba(63,185,80,.35); }
+  .clock{ position:absolute; right:10px; top:8px; font-size:11px; color:var(--muted); z-index:4; }
+  /* causal arrows + batch formers */
+  .t3-conduit{ position:relative; height:38px; margin:-2px 0 8px; display:flex; align-items:center; justify-content:center; }
+  .t3-conduit .clabel{ font-size:12px; color:var(--muted); transition:color .3s; z-index:2; background:var(--panel); padding:0 8px; }
+  .t3-conduit.hot .clabel{ color:var(--green); }
+  .t3-conduit::before{ content:""; position:absolute; left:50%; top:4px; bottom:4px; width:2px; transform:translateX(-50%);
+    background:repeating-linear-gradient(to top, #2b3340 0 6px, transparent 6px 12px); }
+  .t3-conduit.hot::before{ background:repeating-linear-gradient(to top, var(--green) 0 6px, transparent 6px 12px); opacity:.5; }
+  .t3-conduit .spark{ position:absolute; left:50%; bottom:3px; width:11px; height:11px; border-radius:50%;
+    background:var(--green); opacity:0; box-shadow:0 0 12px var(--green); transform:translateX(-50%); z-index:3; }
+  .t3-conduit.hot .spark{ animation:rise .85s linear infinite; }
+  .t3-conduit.hot .spark.s2{ animation-delay:.42s; }
+  @keyframes rise{ 0%{opacity:0; transform:translate(-50%,0);} 15%{opacity:1;} 100%{opacity:0; transform:translate(-50%,-34px);} }
+  .pipe.fed .lane{ border-color:var(--green); box-shadow:0 0 10px rgba(63,185,80,.25); }
+  .formers{ display:flex; gap:14px; justify-content:center; flex-wrap:wrap; }
+  .former{ flex:1; min-width:240px; max-width:320px; background:#0d1117; border:1px solid var(--line);
+    border-radius:10px; padding:10px 12px; transition:border-color .3s, box-shadow .3s; }
+  .former h5{ margin:0 0 6px; font-size:13px; }
+  .former .hitbox{ font-size:12px; color:var(--muted); margin-bottom:8px; }
+  .former .hitbox b{ color:var(--amber); font-size:14px; }
+  .former.ready{ border-color:var(--green); box-shadow:0 0 12px rgba(63,185,80,.2); }
+  .former.ready .hitbox b{ color:var(--green); }
+  .mbrow{ display:flex; gap:6px; }
+  .mbchip{ flex:1; height:24px; border-radius:6px; background:#11161f; border:1px solid var(--line);
+    display:flex; align-items:center; justify-content:center; font-size:11px; color:var(--muted); opacity:.3; transition:all .25s; }
+  .mbchip.on{ opacity:1; background:linear-gradient(180deg,#143055,#102844); border-color:var(--blue); color:#cfe6ff; }
+  .mbchip.fixed{ opacity:1; background:linear-gradient(180deg,#0f3a1d,#0e2c18); border-color:var(--green); color:#c4f7d4; }
+  .former .chk{ font-size:12px; color:var(--green); margin-top:6px; min-height:16px; }
+
+  /* ---------- tab4: thread relationships ---------- */
+  .t4wrap{ display:flex; gap:18px; flex-wrap:wrap; }
+  .t4flow{ flex:2; min-width:340px; display:flex; flex-direction:column; align-items:stretch; gap:0; }
+  .t4why{ flex:1; min-width:280px; display:flex; flex-direction:column; gap:10px; }
+  .tbox{ border:1px solid var(--line); border-radius:10px; padding:10px 12px; background:#0d1117; position:relative; }
+  .tbox .tname{ font-size:14px; font-weight:700; display:flex; align-items:center; gap:8px; }
+  .tbox .tdesc{ font-size:11.5px; color:var(--muted); margin-top:3px; }
+  .tbox .pin{ font-size:10px; padding:1px 7px; border-radius:999px; border:1px solid var(--line); color:var(--muted); }
+  .tbox.thread-hit{ border-left:4px solid var(--purple); }
+  .tbox.thread-io{ border-left:4px solid var(--blue); }
+  .tbox.thread-sync{ border-left:4px solid var(--cyan); }
+  .tbox.sched{ border-left:4px solid var(--muted); background:#11161f; }
+  .tarrow{ text-align:center; color:var(--muted); font-size:11px; padding:5px 0; position:relative; }
+  .tarrow b{ color:var(--text); }
+  .minnode{ align-self:center; margin:6px 0; border:1.5px solid var(--amber); border-radius:999px;
+    padding:7px 16px; font-size:12px; color:#ffe2ab; background:#1d1808; font-weight:600; }
+  .minnode.g2{ border-color:var(--green); color:#c4f7d4; background:#0f2a18; }
+  .minnode small{ display:block; font-size:10px; color:var(--muted); font-weight:400; }
+  .whycard{ border:1px solid var(--line); border-radius:10px; padding:11px 13px; background:var(--panel2); }
+  .whycard h4{ margin:0 0 6px; font-size:13px; }
+  .whycard p{ margin:0; font-size:12px; color:var(--muted); }
+  .whycard.a{ border-left:4px solid var(--purple); }
+  .whycard.b{ border-left:4px solid var(--cyan); }
+  .whycard.c{ border-left:4px solid var(--green); }
+  .whycard code{ font-size:11px; }
+
+  /* ---------- tab4 animation states ---------- */
+  #scene4 .tbox,#scene4 .tarrow,#scene4 .minnode,#scene4 .whycard{ transition:all .3s; }
+  #scene4 .dimmed{ opacity:.3; }
+  .tbox.lit{ box-shadow:0 0 18px rgba(88,166,255,.45); transform:translateX(5px); }
+  .tbox.thread-hit.lit{ box-shadow:0 0 18px rgba(188,140,255,.55); }
+  .tbox.thread-io.lit{ box-shadow:0 0 18px rgba(88,166,255,.55); }
+  .tbox.thread-sync.lit{ box-shadow:0 0 18px rgba(86,212,221,.55); }
+  .tbox.sched.lit{ box-shadow:0 0 18px rgba(63,185,80,.4); }
+  .tarrow.lit{ color:var(--green); font-weight:700; }
+  .tarrow.lit b{ color:var(--green); }
+  .tarrow.lit::after{ content:" ●"; color:var(--green); animation:t4blink .7s infinite; }
+  @keyframes t4blink{ 0%,100%{opacity:.2;} 50%{opacity:1;} }
+  .minnode.lit{ box-shadow:0 0 22px rgba(210,153,34,.65); transform:scale(1.06); }
+  .minnode.g2.lit{ box-shadow:0 0 22px rgba(63,185,80,.65); }
+  .whycard.lit{ box-shadow:0 0 18px rgba(63,185,80,.35); transform:translateY(-3px); border-left-width:6px; }
+
+  /* ---------- tab5: two-request full lifecycle story ---------- */
+  #scene5 .story-top{ display:flex; gap:14px; align-items:stretch; margin-bottom:14px; }
+  .gpu-badge{ width:120px; border:1px solid var(--line); border-radius:12px; background:#0d1117;
+    display:flex; flex-direction:column; align-items:center; justify-content:center; gap:2px;
+    font-size:12px; color:var(--muted); transition:all .3s; }
+  .gpu-badge .ic{ font-size:22px; }
+  .gpu-badge.busy{ border-color:var(--blue); color:#cfe6ff; box-shadow:0 0 18px rgba(88,166,255,.45);
+    background:linear-gradient(180deg,#143055,#102844); animation:gpupulse 1s infinite; }
+  @keyframes gpupulse{ 0%,100%{box-shadow:0 0 12px rgba(88,166,255,.3);} 50%{box-shadow:0 0 22px rgba(88,166,255,.6);} }
+  .l3box{ flex:1; border:1px solid var(--line); border-radius:12px; background:#0d1117; padding:8px 12px; transition:all .3s; }
+  .l3box.hot{ border-color:var(--green); box-shadow:0 0 14px rgba(63,185,80,.25); }
+  .l3title{ font-size:12px; color:var(--muted); margin-bottom:6px; display:flex; justify-content:space-between; align-items:center; }
+  .l3title .badge{ font-size:10px; padding:1px 8px; border-radius:999px; border:1px solid var(--line); color:var(--muted); min-width:46px; text-align:center; }
+  .l3title .badge.miss{ border-color:var(--red); color:#ffd4d0; }
+  .l3title .badge.hit{ border-color:var(--green); color:#c4f7d4; }
+  .pagerow{ display:flex; gap:6px; flex-wrap:wrap; min-height:28px; }
+  .pg{ width:38px; height:26px; border-radius:6px; border:1px solid var(--line); background:#11161f;
+    display:flex; align-items:center; justify-content:center; font-size:11px; color:#5b6470;
+    transition:all .3s; opacity:0; transform:scale(.6); }
+  .pg.show{ opacity:1; transform:none; }
+  .pg.l3{ border-color:var(--green); color:#c4f7d4; background:linear-gradient(180deg,#0f3a1d,#0e2c18); }
+
+  #scene5 .ranks{ display:flex; flex-direction:column; gap:10px; }
+  .ranklane{ border:1px solid var(--line); border-radius:12px; padding:8px 12px; background:var(--panel2); transition:all .3s; }
+  .ranklane.active{ border-color:var(--blue); box-shadow:0 0 12px rgba(88,166,255,.22); }
+  .rankhdr{ display:flex; align-items:center; gap:10px; margin-bottom:6px; font-size:12px; }
+  .rankhdr .rname{ font-weight:700; color:var(--text); }
+  .rankhdr .tps{ display:flex; gap:4px; }
+  .rankhdr .tp{ width:8px; height:8px; border-radius:50%; background:#11161f; border:1px solid var(--line); transition:all .25s; }
+  .rankhdr .tp.on{ background:var(--blue); border-color:var(--blue); box-shadow:0 0 6px var(--blue); }
+  .rankhdr .rstat{ margin-left:auto; font-size:11px; padding:1px 9px; border-radius:999px; border:1px solid var(--line); color:var(--muted); }
+  .rankhdr .rstat.miss{ border-color:var(--red); color:#ffd4d0; }
+  .rankhdr .rstat.hit{ border-color:var(--green); color:#c4f7d4; }
+  .rankhdr .rstat.warn{ border-color:var(--amber); color:#ffe2ab; }
+  .htree{ display:flex; align-items:center; gap:8px; min-height:28px; }
+  .htree .root{ font-size:10px; color:var(--muted); padding:2px 7px; border:1px dashed var(--line); border-radius:6px; }
+  .htnode{ width:40px; height:26px; border-radius:6px; border:1px solid var(--line); background:#11161f;
+    display:flex; align-items:center; justify-content:center; font-size:11px; color:#5b6470; position:relative;
+    transition:all .35s; opacity:0; transform:translateY(-6px) scale(.7); }
+  .htnode::before{ content:""; position:absolute; left:-8px; top:50%; width:8px; height:1px; background:var(--line); }
+  .htnode:first-of-type::before{ display:none; }
+  .htnode.show{ opacity:1; transform:none; }
+  .htnode.committed{ border-color:var(--green); color:#c4f7d4; background:linear-gradient(180deg,#0f3a1d,#0e2c18); }
+  .htnode.inserting{ border-color:var(--blue); color:#cfe6ff; background:linear-gradient(180deg,#143055,#102844); }
+  .htnode.matched{ border-color:var(--cyan); color:#cdf6fa; box-shadow:0 0 8px rgba(86,212,221,.4); }
+  .htnode.warn{ border-color:var(--amber); color:#ffe2ab; background:#1d1808; }
+  .htnode.evict{ border-color:var(--red); color:#ffd4d0; background:linear-gradient(180deg,#3a1414,#2a1010); }
+  .story-sync{ display:flex; gap:14px; justify-content:center; margin:14px 0 6px; flex-wrap:wrap; }
+  .syncbadge{ font-size:12px; padding:6px 14px; border-radius:999px; border:1.5px solid var(--line); color:var(--muted); transition:all .3s; }
+  .syncbadge.g1.fire{ border-color:var(--amber); color:#ffe2ab; background:#1d1808; box-shadow:0 0 16px rgba(210,153,34,.45); transform:scale(1.04); }
+  .syncbadge.g2.fire{ border-color:var(--green); color:#c4f7d4; background:#0f2a18; box-shadow:0 0 16px rgba(63,185,80,.45); transform:scale(1.04); }
+  .consist-flag{ text-align:center; font-weight:700; font-size:14px; min-height:20px; margin-top:4px; }
+  .consist-flag.ok{ color:var(--green); } .consist-flag.bad{ color:var(--red); }
+
+  /* ---------- tab6: PrefetchAck alignment & anti-hang ---------- */
+  #scene6 .ackmesh{ display:flex; flex-direction:column; gap:8px; margin-bottom:6px; }
+  .ackrow{ display:flex; align-items:center; gap:10px; border:1px solid var(--line); border-radius:10px; padding:6px 10px; background:var(--panel2); transition:all .3s; }
+  .ackrow.blocked{ border-color:var(--red); box-shadow:0 0 12px rgba(248,81,73,.3); }
+  .acklabel{ width:190px; font-size:12px; color:var(--muted); }
+  .acklabel b{ color:var(--text); }
+  .ackslots{ flex:1; display:flex; gap:8px; }
+  .ackchip{ flex:1; height:34px; border-radius:8px; border:1px solid var(--line); background:#0d1117;
+    display:flex; align-items:center; justify-content:center; font-size:11px; color:#5b6470; gap:5px;
+    transition:all .3s; position:relative; }
+  .ackchip .err{ font-size:9px; color:var(--red); }
+  .ackchip.pending{ opacity:.4; }
+  .ackchip.emit{ border-color:var(--blue); color:#cfe6ff; background:linear-gradient(180deg,#143055,#102844); box-shadow:0 0 10px rgba(88,166,255,.4); }
+  .ackchip.passed{ border-color:var(--green); color:#c4f7d4; background:linear-gradient(180deg,#0f3a1d,#0e2c18); }
+  .ackchip.wait{ border-color:var(--amber); color:#ffe2ab; background:#1d1808; animation:pulse 1s infinite; }
+  .ackchip.missing{ border-color:var(--red); border-style:dashed; color:#ffd4d0; background:#2a1010; }
+  .barriers{ border:1px dashed var(--line); border-radius:10px; padding:8px 10px; margin-top:4px; }
+  .barlabel{ font-size:12px; color:var(--muted); margin-bottom:6px; text-align:center; }
+  .barrow{ display:flex; gap:8px; align-items:stretch; }
+  .barrow .barspacer{ width:190px; }
+  .barcols{ flex:1; display:flex; gap:8px; }
+  .bar{ flex:1; border:1.5px solid var(--line); border-radius:8px; padding:6px 4px; text-align:center;
+    font-size:11px; color:var(--muted); transition:all .3s; }
+  .bar .bcount{ display:block; font-size:13px; font-weight:800; margin-top:2px; color:#5b6470; }
+  .bar.waiting{ border-color:var(--amber); color:#ffe2ab; background:#1d1808; animation:pulse 1s infinite; }
+  .bar.waiting .bcount{ color:#ffe2ab; }
+  .bar.fired{ border-color:var(--green); color:#c4f7d4; background:#0f2a18; }
+  .bar.fired .bcount{ color:#c4f7d4; }
+  .bar.dead{ border-color:var(--red); color:#ffd4d0; background:#2a1010; box-shadow:0 0 12px rgba(248,81,73,.4); }
+  .bar.dead .bcount{ color:#ffd4d0; }
+</style>
+</head>
+<body>
+<button id="langBtn" onclick="toggleLang()">EN</button>
+<header>
+  <h1>HiCache × Pipeline Parallel：树一致性 & 防死锁</h1>
+  <p>拓扑 <b>PP=3 × TP=8 = 24 ranks</b> · 行=TP 组、列=PP 组 · MIN all-reduce 保证 radix tree 一致 · 2 套 gloo 组避免后台 collective 死锁</p>
+</header>
+
+<div class="tabs">
+  <button class="tab active" data-tab="story">① 两请求全流程（L3 命中/未命中 · host tree 一致）</button>
+  <button class="tab" data-tab="consistency">② 树一致性（自动播放）</button>
+  <button class="tab" data-tab="deadlock">③ 为什么 2 个组不死锁</button>
+  <button class="tab" data-tab="skew">④ 异步时间差 × MIN 统一步调</button>
+  <button class="tab" data-tab="threads">⑤ 线程关系 & 树一致性</button>
+  <button class="tab" data-tab="ackalign">⑥ PrefetchAck 对齐 & 防 hang</button>
+</div>
+
+<div class="wrap">
+  <!-- ============ TAB 5 (story, shown first) ============ -->
+  <div class="scene" id="scene5">
+    <div class="legend">
+      <span class="chip"><span class="sw" style="background:var(--blue)"></span>GPU 计算 / 插入中</span>
+      <span class="chip"><span class="sw" style="background:var(--cyan)"></span>match 命中前缀</span>
+      <span class="chip"><span class="sw" style="background:var(--amber)"></span>各 rank 不一致（待 MIN 统一）</span>
+      <span class="chip"><span class="sw" style="background:var(--green)"></span>已提交 / 一致</span>
+      <span class="chip"><span class="sw" style="background:var(--red)"></span>未命中 / 淘汰删除</span>
+    </div>
+    <div class="story-top">
+      <div class="gpu-badge" id="gpuBadge"><span class="ic">▣</span><span>GPU 计算</span></div>
+      <div class="l3box" id="l3box">
+        <div class="l3title"><span class="l3lab"><b>L3 持久化存储</b>（storage backend，3 个 rank 共享视图）</span><span class="badge" id="l3badge"></span></div>
+        <div class="pagerow" id="l3pages"></div>
+      </div>
+    </div>
+    <div class="ranks" id="ranks"></div>
+    <div class="story-sync">
+      <div class="syncbadge g1" id="s5sync1">◆ MIN 组1 · prefetch_hits_sync_groups · storage_hit_count</div>
+      <div class="syncbadge g2" id="s5sync2">◆ MIN 组2 · prefetch_completion_sync_groups · completed_tokens</div>
+    </div>
+    <div class="consist-flag" id="s5flag"></div>
+    <div class="caption" id="cap5">自动播放中…</div>
+  </div>
+  <div class="controls" id="ctl5">
+    <button class="ctl primary" id="play5">⏸ 暂停</button>
+    <button class="ctl" id="replay5">⟲ 重播</button>
+  </div>
+
+  <!-- ============ TAB 1 ============ -->
+  <div class="scene hidden" id="scene1">
+    <div class="legend">
+      <span class="chip"><span class="sw" style="background:var(--amber)"></span>命中数被截断（不一致）</span>
+      <span class="chip"><span class="sw" style="background:var(--blue)"></span>TP 组内 MIN 后</span>
+      <span class="chip"><span class="sw" style="background:var(--green)"></span>PP 组内 MIN 后（全局一致）</span>
+    </div>
+    <div id="mesh1"></div>
+    <div class="tree-box">
+      <div class="tree-title" id="treeTitle">所有 24 个 rank 共享同一棵 radix tree</div>
+      <div class="tree" id="sharedTree"></div>
+    </div>
+    <div class="caption" id="cap1">自动播放中…</div>
+  </div>
+  <div class="controls">
+    <button class="ctl primary" id="play1">⏸ 暂停</button>
+    <button class="ctl" id="replay1">⟲ 重播</button>
+  </div>
+
+  <!-- ============ TAB 2 ============ -->
+  <div class="scene hidden" id="scene2">
+    <div class="legend">
+      <span class="chip"><span class="sw" style="background:var(--purple)"></span><b>prefetch_thread</b>（独立后台线程）· reduce(storage_hit_count)</span>
+      <span class="chip"><span class="sw" style="background:var(--cyan)"></span><b>prefetch_sync_thread</b>（独立后台线程）· reduce(completed_tokens)</span>
+    </div>
+    <div class="note" style="margin-bottom:8px;">每个 cell = 1 个 rank，内含 2 个独立后台线程（小圆点 ●A ●B）。每一行是一个 <b>TP communicator</b>，每一列是一个 <b>PP communicator</b>。</div>
+    <div id="mesh2"></div>
+    <div class="groups">
+      <div class="grp g1" id="g1"><b>prefetch_hits_sync_groups</b><br>命中页数归约组（含 TP 环 + PP 环）<br><span style="font-size:11px">reduce(storage_hit_count)</span></div>
+      <div class="grp g2" id="g2"><b>prefetch_completion_sync_groups</b><br>完成 token 归约组（含 TP 环 + PP 环）<br><span style="font-size:11px">reduce(completed_tokens)</span></div>
+    </div>
+    <div class="banner" id="banner2"></div>
+    <div class="caption" id="cap2">选择场景：用 <b>1 套组</b> 会死锁，用 <b>2 套组</b> 则安全。</div>
+  </div>
+  <div class="controls" id="ctl2">
+    <button class="ctl alt" id="play1grp">▶ 1 套组（死锁）</button>
+    <button class="ctl primary" id="play2grp">▶ 2 套组（安全）</button>
+    <button class="ctl" id="reset2">重置</button>
+  </div>
+
+  <!-- ============ TAB 3 ============ -->
+  <div class="scene hidden" id="scene3">
+    <!-- layer 3: pipeline timing (top) -->
+    <div class="t3-section">
+      <div class="t3-title">③ 主 PP 流水线执行<strong>时序</strong> <span class="tag gpu">NCCL · GPU</span>
+        <span style="color:var(--muted);font-size:11px;">时序连续、错峰流动，<strong style="color:var(--green)">不被后台 prefetch 同步打断</strong></span>
+      </div>
+      <div class="pipe" id="pipe">
+        <div class="lane s0"><span class="lname">PP stage 0</span><span class="lane-hint" id="hint0"></span></div>
+        <div class="lane s1"><span class="lname">PP stage 1</span><span class="lane-hint" id="hint1"></span></div>
+        <div class="lane s2"><span class="lname">PP stage 2</span><span class="lane-hint" id="hint2"></span></div>
+      </div>
+      <div class="flow-note">↑ 流水线跑的正是②组好的 <strong>mb0→mb3</strong>，沿 stage0→1→2 错峰对角推进</div>
+    </div>
+
+    <div class="t3-conduit" id="arrow1">
+      <span class="clabel">▲ 组好的 <b>batch &amp; micro-batch 顺序</b> 喂给流水线（内容）</span>
+      <span class="spark"></span><span class="spark s2"></span>
+    </div>
+
+    <!-- layer 2: batch former (middle) -->
+    <div class="t3-section">
+      <div class="t3-title">② 三个 PP rank 用<strong>同一个 storage hit</strong> 组 batch（内容必须逐 rank 一致）</div>
+      <div class="formers" id="formers"></div>
+    </div>
+
+    <div class="t3-conduit" id="arrow2">
+      <span class="clabel">▲ <code>all_reduce(MIN)</code> 输出统一值 <b>6</b> → 决定 batch size</span>
+      <span class="spark"></span><span class="spark s2"></span>
+    </div>
+
+    <!-- layer 1: async prefetch + MIN (bottom = causal source) -->
+    <div class="t3-section">
+      <div class="t3-title">① 异步 prefetch 查询 → <code>all_reduce(MIN)</code> <span class="tag cpu">gloo · CPU 后台线程</span></div>
+      <div class="sync-area" id="syncArea">
+        <div class="clock" id="t3clock">t = 0.0s</div>
+        <div class="barrier" id="barrier" style="left:62%"><span class="blabel">all_reduce(MIN)</span></div>
+        <div class="slabel" id="sl0">rank pp0</div><div class="pkt travel" id="pkt0"><span>pp0 storage hit</span><span class="hv">8</span></div>
+        <div class="slabel" id="sl1">rank pp1</div><div class="pkt travel" id="pkt1"><span>pp1 storage hit</span><span class="hv">6</span></div>
+        <div class="slabel" id="sl2">rank pp2</div><div class="pkt travel" id="pkt2"><span>pp2 storage hit</span><span class="hv">7</span></div>
+      </div>
+    </div>
+    <div class="caption" id="cap3">自动播放中…</div>
+  </div>
+  <div class="controls" id="ctl3">
+    <button class="ctl primary" id="play3">⏸ 暂停</button>
+    <button class="ctl" id="replay3">⟲ 重播</button>
+  </div>
+
+  <!-- ============ TAB 4 ============ -->
+  <div class="scene hidden" id="scene4">
+    <div class="t4wrap">
+      <!-- left: thread data-flow -->
+      <div class="t4flow">
+        <div class="tbox sched" id="t4b0">
+          <div class="tname">调度器 Scheduler <span class="pin">主线程</span></div>
+          <div class="tdesc">发起 prefetch 请求（writeback / load）</div>
+        </div>
+        <div class="tarrow" id="t4a0">▼ <b>prefetch_queue</b>（PrefetchOperation）</div>
+
+        <div class="tbox thread-hit" id="t4b1">
+          <div class="tname">① prefetch_thread <span class="pin">storage-hit 线程</span></div>
+          <div class="tdesc"><code>_storage_hit_query()</code> 查询 L3 命中页数；命中足够→放 prefetch_buffer，不足→prefetch_revoke_queue</div>
+        </div>
+        <div class="minnode" id="t4m1">◆ all_reduce(MIN) storage_hit_count
+          <small>@ prefetch_hits_sync_groups（组1，gloo/CPU，含 TP 环 + PP 环）</small></div>
+        <div class="tarrow" id="t4a1">▼ <b>prefetch_buffer</b></div>
+
+        <div class="tbox thread-io" id="t4b2">
+          <div class="tname">② prefetch_io_aux_thread <span class="pin">IO 加载线程</span></div>
+          <div class="tdesc"><code>_page_transfer()</code> 逐 batch 把页从 L3 读入 host；累加 <b>completed_tokens</b>；<b>每个 storage batch 产生 1 个 PrefetchAck</b>（出错也照常产生）</div>
+        </div>
+        <div class="tarrow" id="t4a2">▼ <b>prefetch_sync_queue</b>（PrefetchAck）</div>
+
+        <div class="tbox thread-sync" id="t4b3">
+          <div class="tname">③ prefetch_sync_thread <span class="pin">completion-token 线程</span></div>
+          <div class="tdesc">对每个 ack 的 <b>completed_tokens</b> 做归约</div>
+        </div>
+        <div class="minnode g2" id="t4m2">◆ all_reduce(MIN) completed_tokens
+          <small>@ prefetch_completion_sync_groups（组2，gloo/CPU，含 TP 环 + PP 环）</small></div>
+        <div class="tarrow" id="t4a3">▼ <b>ack_prefetch_queue</b></div>
+
+        <div class="tbox sched" id="t4b4">
+          <div class="tname">调度器写入 host radix tree</div>
+          <div class="tdesc">只插入 <b>completed_tokens</b> 长度的前缀 → <code>_insert_helper_host()</code></div>
+        </div>
+      </div>
+
+      <!-- right: why consistent -->
+      <div class="t4why">
+        <div class="whycard a" id="t4wa">
+          <h4>为什么 MIN(storage_hit) 一致？</h4>
+          <p>各 rank 命中可能不同（host 内存截断、L3 视图差异）。MIN 取<b>最长公共可命中前缀</b> → 所有 rank <b>抓取范围一致</b>，不会各抓不同长度。</p>
+        </div>
+        <div class="whycard b" id="t4wb">
+          <h4>为什么 MIN(completed_tokens) 一致？</h4>
+          <p>即便抓取范围一致，实际逐页加载仍可能<b>部分失败</b>（<code>page_get</code> 返回 n≠batch）。MIN 只提交<b>所有 rank 都成功落盘的最长公共前缀</b> → 写入 host tree 的长度逐 rank 相同。</p>
+        </div>
+        <div class="whycard c" id="t4wc">
+          <h4>为什么不会 hang？</h4>
+          <p>每个 storage batch <b>都产生且仅产生一个 PrefetchAck</b>（即使出错也照常产生）→ 每个 rank 参与的 reduce <b>次数严格相等</b>，collective 一一对齐。两个 MIN 一起保证：<b>插入 host tree 的前缀逐 rank 完全相同 → 树一致</b>。</p>
+        </div>
+      </div>
+    </div>
+    <div class="caption" id="cap4">两个 MIN 同步点（组1 命中数、组2 完成数）+ 每 batch 恒定 1 个 ack，共同保证 PP 各 rank 的 host radix tree 严格一致。</div>
+  </div>
+  <div class="controls" id="ctl4">
+    <button class="ctl primary" id="play4">⏸ 暂停</button>
+    <button class="ctl" id="replay4">⟲ 重播</button>
+  </div>
+
+  <!-- ============ TAB 6 : PrefetchAck alignment & anti-hang ============ -->
+  <div class="scene hidden" id="scene6">
+    <div class="note" style="margin-bottom:10px;">每个 <b>storage batch</b> 在 <code>_page_transfer</code> 里恒产 <b>1 个 PrefetchAck</b>；<code>prefetch_sync_thread</code> 对<strong>每个 ack</strong> 在组2 做一次 <code>all_reduce(MIN)</code>。所以 <b>ack 数 = batch 数 = 组2 collective 次数</b>，必须逐 rank 相等。</div>
+    <div class="legend">
+      <span class="chip"><span class="sw" style="background:var(--blue)"></span>ack 已产出（参与本轮 reduce）</span>
+      <span class="chip"><span class="sw" style="background:var(--green)"></span>barrier 凑齐 3/3 → 通过</span>
+      <span class="chip"><span class="sw" style="background:var(--amber)"></span>已到达，等待缺席方</span>
+      <span class="chip"><span class="sw" style="background:var(--red)"></span>缺失 ack → 永远等不到</span>
+    </div>
+    <div class="ackmesh" id="ackmesh"></div>
+    <div class="barriers">
+      <div class="barlabel">◆ <code>all_reduce(MIN)</code> @ 组2（prefetch_completion_sync_groups）· 每个 ack 一次 barrier</div>
+      <div class="barrow"><div class="barspacer"></div><div class="barcols" id="barcols"></div></div>
+    </div>
+    <div class="banner" id="banner6"></div>
+    <div class="caption" id="cap6">选择场景：<b>每 batch 恒 1 ack</b> → 次数对齐、安全；<b>出错就 break</b> → ack 缺一个 → 组2 reduce 错位 → hang。</div>
+  </div>
+  <div class="controls" id="ctl6">
+    <button class="ctl primary" id="play6good">▶ 正确（每 batch 恒 1 ack）</button>
+    <button class="ctl alt" id="play6bad">▶ 错误（出错 break → ack 缺失）</button>
+    <button class="ctl" id="reset6">重置</button>
+  </div>
+</div>
+
+<script>
+const PP=3, TP=8;
+// initial hit counts per (pp,tp). some truncated by host-mem pressure.
+const HITS=[
+  [8,8,8,8,8,8,8,8],
+  [8,8,6,8,8,8,8,7],   // stage1: rank(1,2)=6, rank(1,7)=7 truncated
+  [8,8,8,8,8,8,8,8],
+];
+const ROWMIN = HITS.map(r=>Math.min(...r));      // [8,6,8]
+const GMIN = Math.min(...ROWMIN);                // 6
+const sleep=ms=>new Promise(r=>setTimeout(r,ms));
+
+/* ---------- build a mesh ---------- */
+function buildMesh(containerId, withThreads){
+  const c=document.getElementById(containerId);
+  let html='<div class="mesh-head"><div class="tp-hdr">';
+  for(let t=0;t<TP;t++) html+=`<div class="th">TP ${t}</div>`;
+  html+='</div></div>';
+  for(let p=0;p<PP;p++){
+    html+=`<div class="pp-row"><div class="pp-label"><b>PP stage ${p}</b><br>(TP 组)</div><div class="row-cells" id="${containerId}-row${p}">`;
+    for(let t=0;t<TP;t++){
+      html+=`<div class="cell" id="${containerId}-c${p}-${t}">
+        <div class="v">—</div>
+        <div class="rk">r${p*TP+t}</div>
+        ${withThreads?`<div class="tdots"><span class="td a" id="${containerId}-A-${p}-${t}"></span><span class="td b" id="${containerId}-B-${p}-${t}"></span></div>`:''}
+      </div>`;
+    }
+    html+='</div></div>';
+  }
+  // pp-group footer (columns)
+  html+='<div class="pp-foot"><div class="lab">PP 组(列)<br>每列跨 3 个 stage →</div><div class="cols">';
+  for(let t=0;t<TP;t++) html+=`<div class="col" id="${containerId}-col${t}">r${t}·r${t+TP}·r${t+2*TP}</div>`;
+  html+='</div></div>';
+  c.innerHTML=html;
+}
+const cell=(m,p,t)=>document.getElementById(`${m}-c${p}-${t}`);
+const val =(m,p,t)=>cell(m,p,t).querySelector('.v');
+
+buildMesh('mesh1',false);
+buildMesh('mesh2',true);
+
+/* ============================================================
+   TAB 1 : auto-play loop
+   query -> diverge -> TP all_reduce(MIN) -> PP all_reduce(MIN) -> consistent tree
+   ============================================================ */
+let t1Token=0, t1Paused=false;
+const cap1=document.getElementById('cap1');
+const sharedTree=document.getElementById('sharedTree');
+
+function resetMesh1(){
+  for(let p=0;p<PP;p++) for(let t=0;t<TP;t++){
+    const cl=cell('mesh1',p,t); cl.className='cell';
+    val('mesh1',p,t).innerHTML='—';
+  }
+  sharedTree.innerHTML='';
+}
+async function gate(my){ while(t1Paused){ await sleep(120); if(my!==t1Token) throw 0; } }
+async function step(ms,my){ await sleep(ms); await gate(my); if(my!==t1Token) throw 0; }
+
+async function runTab1(){
+  const my=++t1Token;
+  try{
+    while(true){
+      resetMesh1();
+      cap1.innerHTML='拓扑 <b>PP=3 × TP=8 = 24 个 rank</b>：每个 PP stage 下挂 8 个 TP rank。';
+      await step(1600,my);
+
+      // 1) independent query
+      for(let p=0;p<PP;p++) for(let t=0;t<TP;t++){
+        const v=HITS[p][t]; const cl=cell('mesh1',p,t);
+        val('mesh1',p,t).innerHTML=v+' <small>pg</small>';
+        if(v!==8) cl.classList.add('varied');
+        await sleep(35);
+      }
+      cap1.innerHTML='① 各 rank <span class="k">独立</span>向 L3 查询前缀命中。<b style="color:var(--amber)">注意 r10、r15 因 host 内存压力被截断</b>（6 / 7 页）。';
+      await step(2200,my);
+
+      // 2) diverge warning
+      cap1.innerHTML='② 若各 rank 按自己的命中数建 radix tree → 树高不一致 → 后续 PP 集合通信 <b style="color:var(--red)">shape mismatch → crash</b>。';
+      for(let p=0;p<PP;p++) for(let t=0;t<TP;t++) if(HITS[p][t]!==8){ cell('mesh1',p,t).classList.add('bad'); }
+      await step(2200,my);
+      for(let p=0;p<PP;p++) for(let t=0;t<TP;t++) cell('mesh1',p,t).classList.remove('bad','varied');
+
+      // 3) TP all_reduce(MIN) — sweep each row (all rows in parallel)
+      cap1.innerHTML='③ 第一步：在 <span class="k">TP 组（每一行 8 个 rank）</span>内 <code>all_reduce(MIN)</code>。';
+      for(let t=0;t<TP;t++){
+        for(let p=0;p<PP;p++) cell('mesh1',p,t).classList.add('sweep');
+        await step(110,my);
+        for(let p=0;p<PP;p++) cell('mesh1',p,t).classList.remove('sweep');
+      }
+      for(let p=0;p<PP;p++) for(let t=0;t<TP;t++){
+        const cl=cell('mesh1',p,t); cl.classList.add('tpmin');
+        val('mesh1',p,t).innerHTML=ROWMIN[p]+' <small>pg</small>';
+      }
+      cap1.innerHTML='③ TP 组归约后：<b>每一行变得一致</b>（PP0=8, PP1=6, PP2=8 = 各行最小值）。';
+      await step(1900,my);
+
+      // 4) PP all_reduce(MIN) — sweep each column (top->bottom)
+      cap1.innerHTML='④ 第二步：在 <span class="k">PP 组（每一列 3 个 rank）</span>内 <code>all_reduce(MIN)</code> → 收敛到全局最小值。';
+      for(let p=0;p<PP;p++){
+        for(let t=0;t<TP;t++) cell('mesh1',p,t).classList.add('sweep');
+        await step(180,my);
+        for(let t=0;t<TP;t++) cell('mesh1',p,t).classList.remove('sweep');
+      }
+      for(let p=0;p<PP;p++) for(let t=0;t<TP;t++){
+        const cl=cell('mesh1',p,t); cl.classList.remove('tpmin'); cl.classList.add('gmin');
+        val('mesh1',p,t).innerHTML=GMIN+' <small>pg</small>';
+      }
+      cap1.innerHTML='④ PP 组归约后：<b style="color:var(--green)">全部 24 个 rank 命中数 = 6</b>（最长公共前缀）。';
+      await step(1700,my);
+
+      // 5) shared consistent tree
+      cap1.innerHTML='⑤ 所有 rank 都只 prefetch / 建树到 6 → <span style="color:var(--green)">24 个 rank 的 radix tree 完全一致 ✓</span>';
+      sharedTree.innerHTML='';
+      for(let i=0;i<GMIN;i++){
+        const n=document.createElement('div'); n.className='tnode'; n.textContent='page '+i; sharedTree.appendChild(n);
+        await step(120,my); n.classList.add('show');
+      }
+      await step(2600,my);
+    }
+  }catch(e){ /* cancelled */ }
+}
+
+document.getElementById('play1').onclick=function(){
+  t1Paused=!t1Paused;
+  this.textContent=t1Paused?'▶ 播放':'⏸ 暂停';
+};
+document.getElementById('replay1').onclick=()=>{ t1Paused=false; document.getElementById('play1').textContent='⏸ 暂停'; runTab1(); };
+
+/* ============================================================
+   TAB 2 : why 2 group-sets avoid deadlock (PP×TP mesh)
+   ============================================================ */
+let t2Token=0;
+const cap2=document.getElementById('cap2');
+const banner2=document.getElementById('banner2');
+const g1=document.getElementById('g1'), g2=document.getElementById('g2');
+const row2=p=>document.getElementById('mesh2-row'+p);
+const col2=t=>document.getElementById('mesh2-col'+t);
+const dotEl=(op,p,t)=>document.getElementById(`mesh2-${op}-${p}-${t}`);
+
+function resetMesh2(){
+  ++t2Token;
+  for(let p=0;p<PP;p++){
+    row2(p).className='row-cells';
+    for(let t=0;t<TP;t++){
+      const cl=cell('mesh2',p,t); cl.className='cell';
+      val('mesh2',p,t).innerHTML=GMIN+' <small>pg</small>';
+      dotEl('A',p,t).className='td a'; dotEl('B',p,t).className='td b';
+    }
+  }
+  for(let t=0;t<TP;t++) col2(t).className='col';
+  g1.classList.remove('hot'); g2.classList.remove('hot'); g2.style.opacity=1;
+  banner2.className='banner'; banner2.textContent='';
+}
+async function s2(ms,my){ await sleep(ms); if(my!==t2Token) throw 0; }
+
+/* ---- 1 shared group set -> deadlock ---- */
+async function play1Group(){
+  resetMesh2(); const my=t2Token;
+  try{
+    g2.style.opacity=.25; g1.classList.add('hot');
+    cap2.innerHTML='只有 <b>1 套组</b>：prefetch_thread(A) 与 prefetch_sync_thread(B) 共用同一个 communicator 集。';
+    await s2(800,my);
+
+    // each rank's two threads race: some submit A first (purple), some B first (cyan)
+    cap2.innerHTML='两个后台线程<b>独立调度、顺序不定</b>：同一个 TP 环里，有的 rank 先发 A，有的先发 B。';
+    for(let p=0;p<PP;p++){
+      for(let t=0;t<TP;t++){
+        const aFirst=((p+t)%2===0);      // deterministic but mixed within each row
+        const first=aFirst?'A':'B';
+        dotEl(first,p,t).classList.add('on');
+      }
+    }
+    await s2(1200,my);
+
+    // rings cannot align -> red
+    cap2.innerHTML='同一个 communicator 上各 rank 提交的 collective <b style="color:var(--red)">不是同一个</b>（A 与 B 错位）→ rendezvous 永远配不上。';
+    for(let p=0;p<PP;p++){
+      row2(p).classList.add('ring-bad');
+      for(let t=0;t<TP;t++){
+        cell('mesh2',p,t).classList.add('bad');
+        const aFirst=((p+t)%2===0);
+        dotEl(aFirst?'A':'B',p,t).className=`td ${aFirst?'a':'b'} dead`;
+      }
+    }
+    for(let t=0;t<TP;t++) col2(t).classList.add('ring-bad');
+    await s2(700,my);
+    banner2.className='banner bad'; banner2.textContent='💥 DEADLOCK — 整个 24-rank job 卡死';
+    cap2.innerHTML='只要任一 communicator 上 A/B 交错，该环就死锁 → 全局 PP/TP 通信连环卡住。';
+  }catch(e){}
+}
+
+/* ---- 2 group sets -> safe ---- */
+async function play2Groups(){
+  resetMesh2(); const my=t2Token;
+  try{
+    g1.classList.add('hot'); g2.classList.add('hot');
+    cap2.innerHTML='用 <b>2 套独立组</b>：<b style="color:var(--purple)">A 永远走 prefetch_hits_sync_groups</b>，<b style="color:var(--cyan)">B 永远走 prefetch_completion_sync_groups</b>。';
+    await s2(800,my);
+
+    // wave A: every rank's prefetch_thread submits A to group-set-1; TP rings + PP rings light purple
+    cap2.innerHTML='第一波：所有 rank 的 <b>prefetch_thread</b> 只在 <code>prefetch_hits_sync_groups</code> 上提交 A → 序列一致。';
+    for(let p=0;p<PP;p++){
+      row2(p).classList.add('ring-a');
+      for(let t=0;t<TP;t++) dotEl('A',p,t).classList.add('on');
+    }
+    for(let t=0;t<TP;t++) col2(t).classList.add('ring-a');
+    await s2(1100,my);
+    for(let p=0;p<PP;p++){
+      row2(p).className='row-cells';
+      for(let t=0;t<TP;t++) dotEl('A',p,t).className='td a done';
+    }
+    for(let t=0;t<TP;t++) col2(t).className='col';
+    cap2.innerHTML='✓ TP 环 + PP 环上 A 全部到齐 → 第一波归约完成。';
+    await s2(900,my);
+
+    // wave B: prefetch_sync_thread submits B to group-set-2; rings light cyan
+    cap2.innerHTML='第二波：所有 rank 的 <b>prefetch_sync_thread</b> 只在 <code>prefetch_completion_sync_groups</code> 上提交 B → 序列一致。';
+    for(let p=0;p<PP;p++){
+      row2(p).classList.add('ring-b');
+      for(let t=0;t<TP;t++) dotEl('B',p,t).classList.add('on');
+    }
+    for(let t=0;t<TP;t++) col2(t).classList.add('ring-b');
+    await s2(1100,my);
+    for(let p=0;p<PP;p++){
+      row2(p).className='row-cells';
+      for(let t=0;t<TP;t++){ dotEl('B',p,t).className='td b done'; cell('mesh2',p,t).classList.add('gmin'); }
+    }
+    for(let t=0;t<TP;t++) col2(t).className='col';
+    banner2.className='banner ok'; banner2.textContent='✅ 安全 — 24 个 rank 全部对齐完成';
+    cap2.innerHTML='每个 communicator 上的 collective 序列在所有 rank <b style="color:var(--green)">完全一致</b>（A→组1、B→组2 不交叉）→ 不会死锁。';
+  }catch(e){}
+}
+
+document.getElementById('play1grp').onclick=play1Group;
+document.getElementById('play2grp').onclick=play2Groups;
+document.getElementById('reset2').onclick=()=>{ resetMesh2(); cap2.innerHTML='选择场景：用 <b>1 套组</b> 会死锁，用 <b>2 套组</b> 则安全。'; };
+
+/* ============================================================
+   TAB 3 : async PP skew  x  all_reduce(MIN) unifies pace
+   Top: continuous, skewed micro-batch pipeline (CSS infinite) — never pauses.
+   Bottom: time-driven (rAF). Each rank's prefetch op arrives at the MIN barrier
+   at a DIFFERENT wall-clock time (async skew). Early arrivals park & wait on the
+   gloo CPU group (background thread). When the slowest arrives, one MIN flash
+   unifies all three to 6 and they depart together — while the top pipeline keeps
+   flowing untouched.
+   ============================================================ */
+const NMB=4;                          // micro-batches per batch (illustrative)
+const MB_LABELS=Array.from({length:NMB},(_,k)=>'mb'+k);
+// build top pipeline micro-batches: one controllable block per (stage, mb)
+(function buildPipe(){
+  for(let s=0;s<3;s++){
+    const lane=document.querySelector('#pipe .s'+s);
+    for(let k=0;k<NMB;k++){
+      const mb=document.createElement('div');
+      mb.className='mb'; mb.id=`pmb-${s}-${k}`; mb.textContent=MB_LABELS[k];
+      lane.appendChild(mb);
+    }
+  }
+})();
+// build batch formers (one per PP rank): hit input + ordered mb chips
+(function buildFormers(){
+  const host=document.getElementById('formers');
+  for(let p=0;p<3;p++){
+    const f=document.createElement('div'); f.className='former'; f.id='former'+p;
+    let chips='';
+    for(let k=0;k<NMB;k++) chips+=`<div class="mbchip" id="fchip-${p}-${k}">${MB_LABELS[k]}</div>`;
+    f.innerHTML=`<h5>调度器 · PP rank ${p}</h5>
+      <div class="hitbox"><span class="ht1">已缓存前缀 storage hit = </span><b id="fhit${p}">？</b><span class="ht2"> 页 → 决定 batch 组成</span></div>
+      <div class="mbrow">${chips}</div>
+      <div class="chk" id="fchk${p}"></div>`;
+    host.appendChild(f);
+  }
+})();
+const arrow1=document.getElementById('arrow1');
+const arrow2=document.getElementById('arrow2');
+
+const PKT=[
+  { el:document.getElementById('pkt0'), lab:document.getElementById('sl0'), y:34,  hit:8, arrive:2.6 },
+  { el:document.getElementById('pkt1'), lab:document.getElementById('sl1'), y:85,  hit:6, arrive:1.9 }, // arrives first
+  { el:document.getElementById('pkt2'), lab:document.getElementById('sl2'), y:136, hit:7, arrive:3.9 }, // slowest -> everyone waits
+];
+const GMIN3=Math.min(...PKT.map(p=>p.hit)); // 6
+const T3={ START:0.4, SYNC:3.9, FLASH_END:4.5, DEPART_END:5.4,
+           DROP:4.5, MB_START:5.0, MB_STEP:0.35, READY:6.4, CYCLE:14.0,
+           PIPE_START:6.4, MB_LAG:1.0, STAGE_LAG:0.9, MB_TRAVEL:3.2,
+           X0:11, XB:62, X1:97 }; // seconds / percentages
+const cap3=document.getElementById('cap3');
+const barrier=document.getElementById('barrier');
+const t3clock=document.getElementById('t3clock');
+let t3raf=null, t3on=false, t3paused=false, t3start=0, t3lastT=0;
+
+function placePkt(p){ p.lab.style.top=p.y+'px'; p.el.style.top=p.y+'px'; }
+PKT.forEach(placePkt);
+function lerp(a,b,u){ return a+(b-a)*Math.max(0,Math.min(1,u)); }
+// returns progress 0..1 if t (or its wrapped form) is inside [st,en), else -1
+function pipeActive(st,en,t){
+  if(t>=st && t<en) return (t-st)/(en-st);
+  const tw=t+T3.CYCLE;
+  if(tw>=st && tw<en) return (tw-st)/(en-st);   // previous batch still draining across loop
+  return -1;
+}
+
+function resetFormers(){
+  arrow1.classList.remove('hot'); arrow2.classList.remove('hot');
+  for(let p=0;p<3;p++){
+    document.getElementById('fhit'+p).textContent='？';
+    document.getElementById('former'+p).classList.remove('ready');
+    document.getElementById('fchk'+p).textContent='';
+    for(let k=0;k<NMB;k++) document.getElementById(`fchip-${p}-${k}`).className='mbchip';
+  }
+}
+
+function t3frame(now){
+  if(!t3on){ return; }
+  if(!t3paused){
+    let t=((now - t3start)/1000) % T3.CYCLE;
+    t3lastT=t;
+    t3clock.textContent='t = '+t.toFixed(1)+'s';
+    barrier.classList.toggle('fire', (t>=T3.SYNC && t<T3.FLASH_END));
+
+    // --- layer 1: async packets toward MIN barrier ---
+    PKT.forEach(p=>{
+      let xPct, cls;
+      if(t < T3.START){ xPct=T3.X0; cls='travel'; }
+      else if(t < p.arrive){ xPct=lerp(T3.X0, T3.XB, (t-T3.START)/(p.arrive-T3.START)); cls='travel'; }
+      else if(t < T3.FLASH_END){ xPct=T3.XB; cls='wait'; }
+      else if(t < T3.DEPART_END){ xPct=lerp(T3.XB, T3.X1, (t-T3.FLASH_END)/(T3.DEPART_END-T3.FLASH_END)); cls='unified'; }
+      else { xPct=T3.X1; cls='unified'; }
+      p.el.style.left='calc('+xPct+'% - 54px)';
+      p.el.className='pkt '+cls;
+      p.el.querySelector('.hv').textContent = (t>=T3.SYNC? GMIN3 : p.hit);
+    });
+
+    // --- layer 2: batch formers driven by unified value ---
+    if(t < T3.FLASH_END){
+      resetFormers();
+    } else {
+      arrow2.classList.add('hot');                 // MIN -> value out
+      for(let p=0;p<3;p++) document.getElementById('fhit'+p).textContent=GMIN3;
+      // light mb chips in identical order across all three formers
+      let lit=0;
+      for(let k=0;k<NMB;k++){
+        if(t >= T3.MB_START + k*T3.MB_STEP){
+          for(let p=0;p<3;p++) document.getElementById(`fchip-${p}-${k}`).classList.add('on');
+          lit++;
+        }
+      }
+      if(t >= T3.READY){
+        arrow1.classList.add('hot');               // batch -> pipeline content
+        for(let p=0;p<3;p++){
+          document.getElementById('former'+p).classList.add('ready');
+          document.getElementById('fchk'+p).textContent='✓ batch & mb 顺序一致';
+          for(let k=0;k<NMB;k++) document.getElementById(`fchip-${p}-${k}`).className='mbchip fixed';
+        }
+      } else {
+        arrow1.classList.remove('hot');
+      }
+    }
+
+    // --- layer 3: the formed mb0..mb3 flow through stage0->1->2 (diagonal pipeline) ---
+    for(let s=0;s<3;s++){
+      for(let k=0;k<NMB;k++){
+        const b=document.getElementById(`pmb-${s}-${k}`);
+        const st=T3.PIPE_START + k*T3.MB_LAG + s*T3.STAGE_LAG;
+        const u=pipeActive(st, st+T3.MB_TRAVEL, t);   // handles cycle wrap (previous batch still draining)
+        if(u>=0){ b.style.left=(19+u*73)+'%'; b.style.opacity=1; }
+        else b.style.opacity=0;
+      }
+      document.getElementById('hint'+s).textContent = (t>1.4 && t<T3.PIPE_START) ? '（等待②组好的 batch…）' : '';
+    }
+
+    // --- captions ---
+    if(t < T3.START) cap3.innerHTML='① 三个 PP rank 的 prefetch 查询<strong>异步发起</strong>（到达时刻不同）。';
+    else if(t < PKT[2].arrive) cap3.innerHTML='① 先到的 rank 在 <span class="k">gloo CPU 后台线程</span>上<b style="color:var(--amber)">等待对齐</b>（不占 GPU）。';
+    else if(t < T3.FLASH_END) cap3.innerHTML='① <b style="color:var(--amber)">pp2 最慢</b>到达 → <code>all_reduce(MIN)</code> 把 8/6/7 <strong style="color:var(--green)">统一成 6</strong>。';
+    else if(t < T3.READY) cap3.innerHTML='② 统一后的 <b>storage hit = 6</b> 下发给各 rank 调度器 → 决定<strong>已缓存前缀长度 / batch size / micro-batch 顺序</strong>（mb0→mb3）。';
+    else cap3.innerHTML='③ 三个 rank 因拿到<strong style="color:var(--green)">同一个 6</strong> 而组出<strong style="color:var(--green)">完全一致的 batch 与 mb 顺序</strong>，喂给 PP 流水线；执行时序连续不被打断。<br><span style="color:var(--red)">⚠ 若 storage hit 不统一 → batch/mb 顺序逐 rank 发散 → PP 调度错位、卡死。</span>';
+  }
+  t3raf=requestAnimationFrame(t3frame);
+}
+function startTab3(restart){
+  t3on=true;
+  if(restart || !t3start){ t3start=performance.now(); t3paused=false; document.getElementById('play3').textContent='⏸ 暂停'; }
+  document.getElementById('pipe').classList.remove('paused');
+  if(!t3raf) t3raf=requestAnimationFrame(t3frame);
+}
+function stopTab3(){ t3on=false; if(t3raf){ cancelAnimationFrame(t3raf); t3raf=null; } }
+
+document.getElementById('play3').onclick=function(){
+  t3paused=!t3paused;
+  if(t3paused){ t3start=performance.now() - t3lastT*1000; } // freeze
+  else { t3start=performance.now() - t3lastT*1000; }        // resume from frozen t
+  this.textContent=t3paused?'▶ 播放':'⏸ 暂停';
+  document.getElementById('pipe').classList.toggle('paused', t3paused);
+};
+document.getElementById('replay3').onclick=()=>startTab3(true);
+
+/* ============================================================
+   TAB 4 : animated walk-through of the prefetch thread pipeline.
+   Highlights each stage in sequence; the chain "lights up" as the
+   data (PrefetchOperation → Ack → completed_tokens) flows down, and
+   the matching right-side why-card glows at each MIN sync point.
+   ============================================================ */
+let t4Token=0, t4Paused=false;
+const cap4El=document.getElementById('cap4');
+// [ids-to-light, caption]
+const T4SEQ=[
+  [['t4b0'], '调度器主线程把 prefetch 请求（writeback / load）放入队列，触发后台流水线。'],
+  [['t4a0'], '<b>PrefetchOperation</b> 入队 <code>prefetch_queue</code>，交给后台线程处理。'],
+  [['t4b1'], '① <b>prefetch_thread</b> 调 <code>_storage_hit_query()</code> 查询 L3 命中页数（各 rank 可能不同）。'],
+  [['t4m1','t4wa'], '◆ <b style="color:var(--amber)">第一个 MIN</b>：在 <code>prefetch_hits_sync_groups</code>（组1）对 storage_hit_count 取最小 → <b>抓取范围逐 rank 一致</b>。'],
+  [['t4a1'], '命中足够的请求落入 <code>prefetch_buffer</code>，进入实际 IO 加载。'],
+  [['t4b2'], '② <b>prefetch_io_aux_thread</b> 用 <code>_page_transfer()</code> 逐 batch 把页 L3→host；<b>每个 batch 恒产生 1 个 PrefetchAck</b>（出错也产生）。'],
+  [['t4a2'], '每个 batch 的 <b>PrefetchAck</b> 入队 <code>prefetch_sync_queue</code>。'],
+  [['t4b3'], '③ <b>prefetch_sync_thread</b> 对每个 ack 的 <b>completed_tokens</b> 做归约。'],
+  [['t4m2','t4wb'], '◆ <b style="color:var(--green)">第二个 MIN</b>：在 <code>prefetch_completion_sync_groups</code>（组2）对 completed_tokens 取最小 → <b>真正落盘前缀逐 rank 一致</b>。'],
+  [['t4a3'], '统一后的结果入队 <code>ack_prefetch_queue</code> 回到调度器。'],
+  [['t4b4','t4wc'], '调度器只插入 <b>completed_tokens</b> 长度的前缀 → <code>_insert_helper_host()</code>。每 batch 恒 1 个 ack，<b>reduce 次数严格相等 → 不会 hang</b>。'],
+];
+function t4clear(){
+  document.querySelectorAll('#scene4 .lit').forEach(e=>e.classList.remove('lit'));
+  document.querySelectorAll('#scene4 .dimmed').forEach(e=>e.classList.remove('dimmed'));
+}
+async function t4gate(my){ while(t4Paused){ await sleep(120); if(my!==t4Token) throw 0; } }
+async function t4step(ms,my){ await sleep(ms); await t4gate(my); if(my!==t4Token) throw 0; }
+async function runTab4(){
+  const my=++t4Token;
+  try{
+    while(true){
+      t4clear();
+      cap4El.innerHTML='沿数据流向下逐步点亮：两个 MIN 同步点 + 每 batch 恒定 1 个 ack。';
+      await t4step(1200,my);
+      for(const [ids,cap] of T4SEQ){
+        await t4gate(my);
+        ids.forEach(id=>document.getElementById(id).classList.add('lit'));
+        cap4El.innerHTML=cap;
+        await t4step(1900,my);
+      }
+      cap4El.innerHTML='✅ 闭环：两个 MIN（组1 命中数 + 组2 完成数）+ 每 batch 1 个 ack → <b style="color:var(--green)">PP 各 rank 的 host radix tree 严格一致</b>。';
+      await t4step(2600,my);
+    }
+  }catch(e){ /* cancelled */ }
+}
+function stopTab4(){ ++t4Token; }
+
+document.getElementById('play4').onclick=function(){
+  t4Paused=!t4Paused;
+  this.textContent=t4Paused?'▶ 播放':'⏸ 暂停';
+};
+document.getElementById('replay4').onclick=()=>{ t4Paused=false; document.getElementById('play4').textContent='⏸ 暂停'; runTab4(); };
+
+/* ============================================================
+   TAB 5 : two-request full lifecycle (PP=3 × TP=4).
+   Req A misses (GPU compute → L2 insert → L3 backup), then L2 is
+   evicted (delete, identical across ranks); Req B hits L3 and goes
+   through the two MIN syncs so every PP rank inserts the SAME prefix
+   into its host radix tree → trees stay consistent, no deadlock.
+   ============================================================ */
+const NPG=4;                       // pages tracked in the story
+const RANK_NAMES=['PP rank 0','PP rank 1','PP rank 2'];
+(function buildStory(){
+  let h='';
+  for(let p=0;p<3;p++){
+    let tps=''; for(let t=0;t<4;t++) tps+=`<span class="tp" id="s5tp-${p}-${t}"></span>`;
+    let nodes='<span class="root">host root</span>';
+    for(let i=0;i<NPG;i++) nodes+=`<div class="htnode" id="s5n-${p}-${i}">p${i}</div>`;
+    h+=`<div class="ranklane" id="s5lane${p}">
+      <div class="rankhdr"><span class="rname">${RANK_NAMES[p]}</span><span class="tps">${tps}</span>
+        <span class="rstat" id="s5stat${p}">idle</span></div>
+      <div class="htree">${nodes}</div></div>`;
+  }
+  document.getElementById('ranks').innerHTML=h;
+  let l3=''; for(let i=0;i<NPG;i++) l3+=`<div class="pg" id="s5l3-${i}">p${i}</div>`;
+  document.getElementById('l3pages').innerHTML=l3;
+})();
+
+let t5Token=0, t5Paused=false;
+const cap5=document.getElementById('cap5');
+const s5n=(p,i)=>document.getElementById(`s5n-${p}-${i}`);
+const setNode=(p,i,cls)=>{ s5n(p,i).className='htnode show '+cls; };
+const hideNode=(p,i)=>{ s5n(p,i).className='htnode'; };
+function rstat(p,txt,cls){ const e=document.getElementById('s5stat'+p); e.className='rstat '+(cls||''); e.textContent=txt; }
+function s5flag(txt,cls){ const e=document.getElementById('s5flag'); e.className='consist-flag '+(cls||''); e.innerHTML=txt; }
+function s5reset(){
+  for(let p=0;p<3;p++){
+    document.getElementById('s5lane'+p).className='ranklane';
+    rstat(p,'idle','');
+    for(let t=0;t<4;t++) document.getElementById(`s5tp-${p}-${t}`).className='tp';
+    for(let i=0;i<NPG;i++) hideNode(p,i);
+  }
+  for(let i=0;i<NPG;i++) document.getElementById('s5l3-'+i).className='pg';
+  document.getElementById('gpuBadge').className='gpu-badge';
+  document.getElementById('l3box').className='l3box';
+  document.getElementById('l3badge').className='badge'; document.getElementById('l3badge').textContent='';
+  document.getElementById('s5sync1').className='syncbadge g1';
+  document.getElementById('s5sync2').className='syncbadge g2';
+  s5flag('','');
+}
+async function t5gate(my){ while(t5Paused){ await sleep(120); if(my!==t5Token) throw 0; } }
+async function t5step(ms,my){ await sleep(ms); await t5gate(my); if(my!==t5Token) throw 0; }
+const allRanks=fn=>{ for(let p=0;p<3;p++) fn(p); };
+
+async function runTab5(){
+  const my=++t5Token;
+  try{
+    while(true){
+      s5reset();
+      cap5.innerHTML='场景 <b>PP=3 × TP=4</b>：每个 PP rank 维护一棵 <b>L2 host radix tree</b>，共享底层 <b>L3 持久化存储</b>。跟踪两个请求，看 host tree 如何保持一致。';
+      await t5step(2000,my);
+
+      /* ===== ACT 1 : Request A — miss → GPU → L2 insert → L3 backup ===== */
+      cap5.innerHTML='① <b>请求 A</b> 到达（需要 4 个 page 的前缀），3 个 PP rank 同时处理。';
+      allRanks(p=>{ document.getElementById('s5lane'+p).classList.add('active'); for(let t=0;t<4;t++) document.getElementById(`s5tp-${p}-${t}`).className='tp on'; rstat(p,'req A',''); });
+      await t5step(1700,my);
+
+      cap5.innerHTML='① 查 L2 host tree → <b style="color:var(--red)">miss</b>；查 L3 → <b style="color:var(--red)">miss</b>（存储为空）。';
+      allRanks(p=>rstat(p,'L2/L3 miss','miss'));
+      document.getElementById('l3badge').className='badge miss'; document.getElementById('l3badge').textContent='miss';
+      await t5step(1900,my);
+
+      cap5.innerHTML='① 回退到 <b>GPU 前向计算</b>，生成这 4 个 page 的 KV。';
+      document.getElementById('gpuBadge').classList.add('busy');
+      allRanks(p=>rstat(p,'compute','warn'));
+      await t5step(1800,my);
+      document.getElementById('gpuBadge').classList.remove('busy');
+
+      cap5.innerHTML='① 计算结果写入 <b>L2 host radix tree</b> → 3 个 rank <code>insert</code> <strong style="color:var(--green)">相同</strong>的前缀 p0–p3。';
+      for(let i=0;i<NPG;i++){ allRanks(p=>setNode(p,i,'inserting')); await t5step(240,my); }
+      allRanks(p=>{ for(let i=0;i<NPG;i++) setNode(p,i,'committed'); rstat(p,'L2: 4','hit'); });
+      s5flag('✓ 3 棵 host tree 同步插入 4 个 page（一致）','ok');
+      await t5step(1600,my);
+
+      cap5.innerHTML='① backup 线程把 L2 → <b>L3</b> 持久化（<code>write_backup</code> / <code>page_set</code>）。';
+      document.getElementById('l3box').classList.add('hot');
+      document.getElementById('l3badge').className='badge hit'; document.getElementById('l3badge').textContent='stored';
+      for(let i=0;i<NPG;i++){ document.getElementById('s5l3-'+i).className='pg show l3'; await t5step(220,my); }
+      s5flag('','');
+      await t5step(1500,my);
+
+      /* ===== ACT 1.5 : L2 eviction (delete consistency) ===== */
+      cap5.innerHTML='② host 内存压力 → L2 触发<strong style="color:var(--red)">淘汰</strong>（<code>evict_host</code>）。3 棵 host tree <b>完全一致</b> → 淘汰命中<strong>同一批节点</strong>；L3 仍保留。';
+      for(let i=NPG-1;i>=0;i--){ allRanks(p=>setNode(p,i,'evict')); await t5step(330,my); allRanks(p=>hideNode(p,i)); }
+      allRanks(p=>rstat(p,'L2 empty',''));
+      s5flag('✓ 3 棵 host tree 同步删除（delete 一致）','ok');
+      await t5step(2000,my);
+      s5flag('','');
+
+      /* ===== ACT 2 : Request B — L3 hit → 2 MIN syncs → consistent insert ===== */
+      cap5.innerHTML='③ <b>请求 B</b> 到达（复用 A 的前缀）。L2 host tree 已空 → <b style="color:var(--red)">L2 miss</b>，转向 L3。';
+      allRanks(p=>{ document.getElementById('s5lane'+p).classList.add('active'); rstat(p,'req B','warn'); });
+      await t5step(1700,my);
+
+      cap5.innerHTML='③ <b>prefetch_thread</b> 各 rank 向 L3 查命中页数 → 结果可能<strong style="color:var(--amber)">不同</strong>（host 视图/内存差异）：4 / 3 / 4。';
+      const hitq=[4,3,4];
+      allRanks(p=>{ rstat(p,'L3 hit '+hitq[p], hitq[p]===4?'hit':'warn'); for(let i=0;i<hitq[p];i++) setNode(p,i,'warn'); });
+      s5flag('⚠ 查询长度不一致（4/3/4）→ 若各自建树，host tree 会发散','bad');
+      await t5step(2600,my);
+
+      cap5.innerHTML='◆ <b>第一个 MIN</b> @ <code>prefetch_hits_sync_groups</code>（组1，gloo/CPU，含 TP环+PP环）：<code>all_reduce(MIN)</code> 统一查询长度 = <b>3</b>。';
+      document.getElementById('s5sync1').classList.add('fire');
+      await t5step(1500,my);
+      allRanks(p=>{ for(let i=0;i<NPG;i++) hideNode(p,i); for(let i=0;i<3;i++) setNode(p,i,'matched'); rstat(p,'match 3','hit'); });
+      s5flag('✓ 抓取范围统一 = 3 → match_prefix 逐 rank 一致','ok');
+      await t5step(2200,my);
+
+      cap5.innerHTML='③ <b>prefetch_io_aux_thread</b> 逐 batch 把 page 从 L3 拉回 L2（<code>_page_transfer</code>），每 batch 产 1 个 PrefetchAck。';
+      document.getElementById('s5sync1').classList.remove('fire');
+      for(let i=0;i<3;i++){ allRanks(p=>setNode(p,i,'inserting')); await t5step(300,my); }
+      await t5step(700,my);
+
+      cap5.innerHTML='③ 逐页加载<strong style="color:var(--amber)">部分失败</strong>：rank2 第 3 页 <code>page_get</code> 未成功 → completed_tokens = 3 / 3 / 2。';
+      const done=[3,3,2];
+      allRanks(p=>{ for(let i=0;i<3;i++){ if(i<done[p]) setNode(p,i,'committed'); else setNode(p,i,'warn'); } rstat(p,'done '+done[p], done[p]===3?'hit':'warn'); });
+      s5flag('⚠ 实际落盘不一致（3/3/2）→ 若各自插入，host tree 会发散','bad');
+      await t5step(2600,my);
+
+      cap5.innerHTML='◆ <b>第二个 MIN</b> @ <code>prefetch_completion_sync_groups</code>（组2，<b>独立 communicator</b>）：<code>all_reduce(MIN)</code> 统一 completed_tokens = <b>2</b>。';
+      document.getElementById('s5sync2').classList.add('fire');
+      await t5step(1500,my);
+
+      cap5.innerHTML='③ 各 rank 只把统一的 <b>2 个 page</b> 插入 L2 host tree（<code>_insert_helper_host</code>）→ 3 棵 host tree <strong style="color:var(--green)">再次完全一致</strong>。';
+      allRanks(p=>{ for(let i=0;i<NPG;i++) hideNode(p,i); for(let i=0;i<2;i++) setNode(p,i,'committed'); rstat(p,'L2: 2','hit'); });
+      s5flag('✓ 插入长度统一 = 2 → 3 棵 host tree 完全一致','ok');
+      await t5step(2400,my);
+
+      cap5.innerHTML='✅ 两套<strong>独立 gloo 组</strong>（组1 命中数、组2 完成数）+ 每 batch 恒 1 个 ack → 各 rank 对 host tree 的<strong>插入/删除完全一致</strong> → <b style="color:var(--green)">host radix tree 始终一致，后台 collective 不会死锁</b>。';
+      document.getElementById('s5sync1').classList.add('fire');
+      await t5step(3400,my);
+      document.getElementById('s5sync1').classList.remove('fire');
+      document.getElementById('s5sync2').classList.remove('fire');
+      await t5step(700,my);
+    }
+  }catch(e){ /* cancelled */ }
+}
+function stopTab5(){ ++t5Token; }
+
+document.getElementById('play5').onclick=function(){
+  t5Paused=!t5Paused;
+  this.textContent=t5Paused?'▶ 播放':'⏸ 暂停';
+};
+document.getElementById('replay5').onclick=()=>{ t5Paused=false; document.getElementById('play5').textContent='⏸ 暂停'; runTab5(); };
+
+/* ============================================================
+   TAB 6 : PrefetchAck count alignment & anti-hang.
+   Each storage batch in _page_transfer emits exactly one PrefetchAck,
+   and prefetch_sync_thread does one all_reduce(MIN) on set-2 per ack.
+   So #acks == #batches == #set-2 collectives, and it must be equal on
+   every rank. We compare: (good) each batch always emits 1 ack even on
+   error → counts aligned → safe; (bad) break-on-error drops an ack →
+   one rank does fewer reduces → the others block forever → hang.
+   ============================================================ */
+const T6NB=3;                 // number of storage batches / acks
+(function buildAck(){
+  let m='';
+  for(let p=0;p<3;p++){
+    let slots='';
+    for(let k=0;k<T6NB;k++) slots+=`<div class="ackchip pending" id="ack-${p}-${k}">ack${k}</div>`;
+    m+=`<div class="ackrow" id="ackrow${p}"><div class="acklabel"><b>PP rank ${p}</b> · _page_transfer</div><div class="ackslots">${slots}</div></div>`;
+  }
+  document.getElementById('ackmesh').innerHTML=m;
+  let b='';
+  for(let k=0;k<T6NB;k++) b+=`<div class="bar" id="bar${k}">barrier ${k}<span class="bcount" id="bcnt${k}">0/3</span></div>`;
+  document.getElementById('barcols').innerHTML=b;
+})();
+
+let t6Token=0;
+const cap6=document.getElementById('cap6');
+const banner6=document.getElementById('banner6');
+const ackEl=(p,k)=>document.getElementById(`ack-${p}-${k}`);
+const barEl=k=>document.getElementById('bar'+k);
+const bcnt=k=>document.getElementById('bcnt'+k);
+function t6reset(){
+  ++t6Token;
+  for(let p=0;p<3;p++){
+    document.getElementById('ackrow'+p).className='ackrow';
+    for(let k=0;k<T6NB;k++){ ackEl(p,k).className='ackchip pending'; ackEl(p,k).innerHTML=`ack${k}`; }
+  }
+  for(let k=0;k<T6NB;k++){ barEl(k).className='bar'; bcnt(k).textContent='0/3'; }
+  banner6.className='banner'; banner6.textContent='';
+}
+async function s6(ms,my){ await sleep(ms); if(my!==t6Token) throw 0; }
+
+// nacks: how many acks each rank produces (rank index -> count)
+async function playAck(nacks, label){
+  t6reset(); const my=t6Token;
+  try{
+    cap6.innerHTML=label;
+    await s6(700,my);
+    for(let k=0;k<T6NB;k++){
+      barEl(k).classList.add('waiting');
+      let arrived=0;
+      // ranks emit ack k one by one (async arrival)
+      for(let p=0;p<3;p++){
+        await s6(520,my);
+        if(k<nacks[p]){
+          ackEl(p,k).className='ackchip emit';
+          arrived++; bcnt(k).textContent=arrived+'/3';
+        }
+      }
+      await s6(400,my);
+      if(arrived===3){
+        barEl(k).className='bar fired'; bcnt(k).textContent='3/3 ✓';
+        for(let p=0;p<3;p++) ackEl(p,k).className='ackchip passed';
+        cap6.innerHTML=(window.TR||((z,e)=>z))(
+          `barrier ${k}：3 个 rank 的 ack 都到齐 → <code>all_reduce(MIN)</code> 返回 → 本轮通过。`,
+          `barrier ${k}: all 3 ranks' acks arrived → <code>all_reduce(MIN)</code> returns → this round passes.`);
+        await s6(700,my);
+      }else{
+        // a rank is missing this ack -> collective can never complete
+        barEl(k).className='bar dead'; bcnt(k).textContent=arrived+'/3 ✗';
+        for(let p=0;p<3;p++){
+          if(k<nacks[p]){ ackEl(p,k).className='ackchip wait'; document.getElementById('ackrow'+p).classList.add('blocked'); }
+          else { ackEl(p,k).className='ackchip missing'; ackEl(p,k).innerHTML=`ack${k}<span class="err">${(window.TR||((z,e)=>z))('缺失','missing')}</span>`; }
+        }
+        cap6.innerHTML=(window.TR||((z,e)=>z))(
+          `barrier ${k}：只有 <b>${arrived}/3</b> 个 rank 进了 <code>all_reduce</code>（有 rank 早早 break、少产一个 ack）→ 已到达的 rank <b style="color:var(--amber)">永远阻塞</b>在这次 collective 上。`,
+          `barrier ${k}: only <b>${arrived}/3</b> ranks entered <code>all_reduce</code> (a rank broke early and emitted one fewer ack) → the ranks that arrived are <b style="color:var(--amber)">blocked forever</b> on this collective.`);
+        banner6.className='banner bad'; banner6.textContent='💥 HANG：组2 reduce 次数不一致（3/3/2）→ collective 永不返回';
+        return;
+      }
+    }
+    banner6.className='banner ok'; banner6.textContent='✅ 安全：每 rank 都做了 '+T6NB+' 次 reduce，次数严格相等，全部对齐完成';
+    cap6.innerHTML='每个 batch 恒产 1 个 ack（出错也产）→ <b>ack 数逐 rank 相等</b> → 组2 的 collective 一一对应 → 不会 hang。';
+  }catch(e){ /* cancelled */ }
+}
+function stopTab6(){ ++t6Token; }
+document.getElementById('play6good').onclick=()=>playAck([3,3,3],
+  '<b style="color:var(--green)">正确</b>：即便某 batch 出错，<code>_page_transfer</code> 也<strong>继续循环、照常产 ack</strong> → 三个 rank 都产 3 个 ack。');
+document.getElementById('play6bad').onclick=()=>playAck([3,3,2],
+  '<b style="color:var(--red)">错误（反面教材）</b>：rank2 在 batch2 出错就 <code>break</code> → 只产 2 个 ack，比别人少一个。');
+document.getElementById('reset6').onclick=()=>{ t6reset(); cap6.innerHTML='选择场景：<b>每 batch 恒 1 ack</b> → 次数对齐、安全；<b>出错就 break</b> → ack 缺一个 → 组2 reduce 错位 → hang。'; };
+
+/* ---------- tab switching ---------- */
+const ctl1=document.querySelectorAll('.controls')[1]; // [0] is now ctl5 (story)
+const ctl2=document.getElementById('ctl2');
+const ctl3=document.getElementById('ctl3');
+const ctl4=document.getElementById('ctl4');
+const ctl5=document.getElementById('ctl5');
+const ctl6=document.getElementById('ctl6');
+document.querySelectorAll('.tab').forEach(tab=>{
+  tab.onclick=()=>{
+    document.querySelectorAll('.tab').forEach(x=>x.classList.remove('active'));
+    tab.classList.add('active');
+    const w=tab.dataset.tab;
+    document.getElementById('scene5').classList.toggle('hidden', w!=='story');
+    document.getElementById('scene1').classList.toggle('hidden', w!=='consistency');
+    document.getElementById('scene2').classList.toggle('hidden', w!=='deadlock');
+    document.getElementById('scene3').classList.toggle('hidden', w!=='skew');
+    document.getElementById('scene4').classList.toggle('hidden', w!=='threads');
+    document.getElementById('scene6').classList.toggle('hidden', w!=='ackalign');
+    ctl5.style.display = w==='story'?'flex':'none';
+    ctl1.style.display = w==='consistency'?'flex':'none';
+    ctl2.style.display = w==='deadlock'?'flex':'none';
+    ctl3.style.display = w==='skew'?'flex':'none';
+    ctl4.style.display = w==='threads'?'flex':'none';
+    ctl6.style.display = w==='ackalign'?'flex':'none';
+    if(w==='story'){ ++t1Token; stopTab3(); stopTab4(); stopTab6(); t5Paused=false; document.getElementById('play5').textContent='⏸ 暂停'; runTab5(); }
+    else if(w==='consistency'){ t1Paused=false; document.getElementById('play1').textContent='⏸ 暂停'; runTab1(); stopTab3(); stopTab4(); stopTab5(); stopTab6(); }
+    else if(w==='skew'){ ++t1Token; startTab3(true); stopTab4(); stopTab5(); stopTab6(); }
+    else if(w==='threads'){ ++t1Token; stopTab3(); stopTab5(); stopTab6(); t4Paused=false; document.getElementById('play4').textContent='⏸ 暂停'; runTab4(); }
+    else if(w==='ackalign'){ ++t1Token; stopTab3(); stopTab4(); stopTab5(); t6reset(); }
+    else{ ++t1Token; stopTab3(); stopTab4(); stopTab5(); stopTab6(); }
+  };
+});
+ctl1.style.display='none';
+ctl2.style.display='none';
+ctl3.style.display='none';
+ctl4.style.display='none';
+ctl6.style.display='none';
+resetMesh2();
+runTab5();
+</script>
+
+<script>
+/* ============================================================
+   i18n: translate by text-content (robust to innerHTML normalization).
+   PAIRS = [zhHTML, enHTML]; keys derived from stripped textContent.
+   A MutationObserver re-translates dynamic captions on the fly.
+   ============================================================ */
+(function(){
+  const PAIRS = [
+    // header
+    ['HiCache × Pipeline Parallel：树一致性 & 防死锁','HiCache × Pipeline Parallel: Tree Consistency & Deadlock Avoidance'],
+    ['拓扑 <b>PP=3 × TP=8 = 24 ranks</b> · 行=TP 组、列=PP 组 · MIN all-reduce 保证 radix tree 一致 · 2 套 gloo 组避免后台 collective 死锁',
+     'Topology <b>PP=3 × TP=8 = 24 ranks</b> · rows = TP groups, cols = PP groups · MIN all-reduce keeps radix trees identical · 2 gloo group-sets avoid background-collective deadlock'],
+    // tabs
+    ['① 两请求全流程（L3 命中/未命中 · host tree 一致）','① Two-Request Lifecycle (L3 miss/hit · host-tree consistency)'],
+    ['② 树一致性（自动播放）','② Tree Consistency (auto-play)'],
+    ['③ 为什么 2 个组不死锁','③ Why 2 Groups Avoid Deadlock'],
+    ['④ 异步时间差 × MIN 统一步调','④ Async Skew × MIN Lockstep'],
+    ['⑤ 线程关系 & 树一致性','⑤ Thread Relationships & Consistency'],
+    ['⑥ PrefetchAck 对齐 & 防 hang','⑥ PrefetchAck Alignment & Anti-Hang'],
+    // tab6 note / legend / barrier label / scenario captions / banners
+    ['每个 <b>storage batch</b> 在 <code>_page_transfer</code> 里恒产 <b>1 个 PrefetchAck</b>；<code>prefetch_sync_thread</code> 对<strong>每个 ack</strong> 在组2 做一次 <code>all_reduce(MIN)</code>。所以 <b>ack 数 = batch 数 = 组2 collective 次数</b>，必须逐 rank 相等。',
+     'Each <b>storage batch</b> in <code>_page_transfer</code> always emits <b>exactly 1 PrefetchAck</b>; <code>prefetch_sync_thread</code> does one <code>all_reduce(MIN)</code> on set 2 <strong>per ack</strong>. So <b>#acks = #batches = #set-2 collectives</b>, and it must be equal on every rank.'],
+    ['<span class="sw" style="background:var(--blue)"></span>ack 已产出（参与本轮 reduce）','<span class="sw" style="background:var(--blue)"></span>ack emitted (joins this reduce)'],
+    ['<span class="sw" style="background:var(--green)"></span>barrier 凑齐 3/3 → 通过','<span class="sw" style="background:var(--green)"></span>barrier reaches 3/3 → pass'],
+    ['<span class="sw" style="background:var(--amber)"></span>已到达，等待缺席方','<span class="sw" style="background:var(--amber)"></span>arrived, waiting for the absent rank'],
+    ['<span class="sw" style="background:var(--red)"></span>缺失 ack → 永远等不到','<span class="sw" style="background:var(--red)"></span>missing ack → never arrives'],
+    ['◆ <code>all_reduce(MIN)</code> @ 组2（prefetch_completion_sync_groups）· 每个 ack 一次 barrier',
+     '◆ <code>all_reduce(MIN)</code> @ set 2 (prefetch_completion_sync_groups) · one barrier per ack'],
+    ['选择场景：<b>每 batch 恒 1 ack</b> → 次数对齐、安全；<b>出错就 break</b> → ack 缺一个 → 组2 reduce 错位 → hang。',
+     'Pick a scenario: <b>one ack per batch</b> → counts aligned, safe; <b>break on error</b> → one ack missing → set-2 reduces misalign → hang.'],
+    ['<b style="color:var(--green)">正确</b>：即便某 batch 出错，<code>_page_transfer</code> 也<strong>继续循环、照常产 ack</strong> → 三个 rank 都产 3 个 ack。',
+     '<b style="color:var(--green)">Correct</b>: even if a batch errors, <code>_page_transfer</code> <strong>keeps looping and still emits the ack</strong> → all three ranks emit 3 acks.'],
+    ['<b style="color:var(--red)">错误（反面教材）</b>：rank2 在 batch2 出错就 <code>break</code> → 只产 2 个 ack，比别人少一个。',
+     '<b style="color:var(--red)">Wrong (anti-pattern)</b>: rank2 hits an error at batch2 and <code>break</code>s → emits only 2 acks, one fewer than the others.'],
+    ['▶ 正确（每 batch 恒 1 ack）','▶ Correct (one ack per batch)'],
+    ['▶ 错误（出错 break → ack 缺失）','▶ Wrong (break on error → missing ack)'],
+    ['每个 batch 恒产 1 个 ack（出错也产）→ <b>ack 数逐 rank 相等</b> → 组2 的 collective 一一对应 → 不会 hang。',
+     'Each batch always emits one ack (even on error) → <b>ack counts equal across ranks</b> → set-2 collectives match one-to-one → no hang.'],
+    ['✅ 安全：每 rank 都做了 3 次 reduce，次数严格相等，全部对齐完成','✅ Safe: every rank did 3 reduces, counts strictly equal, all aligned and complete'],
+    ['💥 HANG：组2 reduce 次数不一致（3/3/2）→ collective 永不返回','💥 HANG: set-2 reduce counts differ (3/3/2) → the collective never returns'],
+    // tab5 legend chips
+    ['<span class="sw" style="background:var(--blue)"></span>GPU 计算 / 插入中','<span class="sw" style="background:var(--blue)"></span>GPU compute / inserting'],
+    ['<span class="sw" style="background:var(--cyan)"></span>match 命中前缀','<span class="sw" style="background:var(--cyan)"></span>matched prefix'],
+    ['<span class="sw" style="background:var(--amber)"></span>各 rank 不一致（待 MIN 统一）','<span class="sw" style="background:var(--amber)"></span>diverged per rank (await MIN)'],
+    ['<span class="sw" style="background:var(--green)"></span>已提交 / 一致','<span class="sw" style="background:var(--green)"></span>committed / consistent'],
+    ['<span class="sw" style="background:var(--red)"></span>未命中 / 淘汰删除','<span class="sw" style="background:var(--red)"></span>miss / evicted'],
+    // tab5 static labels
+    ['<b>L3 持久化存储</b>（storage backend，3 个 rank 共享视图）',
+     '<b>L3 persistent storage</b> (storage backend, shared view across 3 ranks)'],
+    ['GPU 计算','GPU compute'],
+    ['◆ MIN 组1 · prefetch_hits_sync_groups · storage_hit_count','◆ MIN set 1 · prefetch_hits_sync_groups · storage_hit_count'],
+    ['◆ MIN 组2 · prefetch_completion_sync_groups · completed_tokens','◆ MIN set 2 · prefetch_completion_sync_groups · completed_tokens'],
+    // tab5 consistency flags
+    ['✓ 3 棵 host tree 同步插入 4 个 page（一致）','✓ all 3 host trees insert 4 pages in sync (consistent)'],
+    ['✓ 3 棵 host tree 同步删除（delete 一致）','✓ all 3 host trees delete in sync (consistent)'],
+    ['⚠ 查询长度不一致（4/3/4）→ 若各自建树，host tree 会发散','⚠ query lengths differ (4/3/4) → building trees independently diverges them'],
+    ['✓ 抓取范围统一 = 3 → match_prefix 逐 rank 一致','✓ fetch range unified = 3 → match_prefix identical per rank'],
+    ['⚠ 实际落盘不一致（3/3/2）→ 若各自插入，host tree 会发散','⚠ actual loads differ (3/3/2) → inserting independently diverges trees'],
+    ['✓ 插入长度统一 = 2 → 3 棵 host tree 完全一致','✓ insert length unified = 2 → all 3 host trees identical'],
+    // tab5 step captions
+    ['场景 <b>PP=3 × TP=4</b>：每个 PP rank 维护一棵 <b>L2 host radix tree</b>，共享底层 <b>L3 持久化存储</b>。跟踪两个请求，看 host tree 如何保持一致。',
+     'Scenario <b>PP=3 × TP=4</b>: each PP rank keeps an <b>L2 host radix tree</b> over a shared <b>L3 persistent storage</b>. We follow two requests and see how the trees stay consistent.'],
+    ['① <b>请求 A</b> 到达（需要 4 个 page 的前缀），3 个 PP rank 同时处理。',
+     '① <b>Request A</b> arrives (needs a 4-page prefix); all 3 PP ranks process it together.'],
+    ['① 查 L2 host tree → <b style="color:var(--red)">miss</b>；查 L3 → <b style="color:var(--red)">miss</b>（存储为空）。',
+     '① Query L2 host tree → <b style="color:var(--red)">miss</b>; query L3 → <b style="color:var(--red)">miss</b> (storage empty).'],
+    ['① 回退到 <b>GPU 前向计算</b>，生成这 4 个 page 的 KV。',
+     '① Fall back to <b>GPU forward compute</b> to produce the KV for these 4 pages.'],
+    ['① 计算结果写入 <b>L2 host radix tree</b> → 3 个 rank <code>insert</code> <strong style="color:var(--green)">相同</strong>的前缀 p0–p3。',
+     '① Results are written into the <b>L2 host radix tree</b> → all 3 ranks <code>insert</code> the <strong style="color:var(--green)">same</strong> prefix p0–p3.'],
+    ['① backup 线程把 L2 → <b>L3</b> 持久化（<code>write_backup</code> / <code>page_set</code>）。',
+     '① The backup thread persists L2 → <b>L3</b> (<code>write_backup</code> / <code>page_set</code>).'],
+    ['② host 内存压力 → L2 触发<strong style="color:var(--red)">淘汰</strong>（<code>evict_host</code>）。3 棵 host tree <b>完全一致</b> → 淘汰命中<strong>同一批节点</strong>；L3 仍保留。',
+     '② Host-memory pressure → L2 <strong style="color:var(--red)">eviction</strong> (<code>evict_host</code>). The 3 host trees are <b>identical</b> → eviction hits the <strong>same nodes</strong>; L3 keeps them.'],
+    ['③ <b>请求 B</b> 到达（复用 A 的前缀）。L2 host tree 已空 → <b style="color:var(--red)">L2 miss</b>，转向 L3。',
+     '③ <b>Request B</b> arrives (reuses A\u2019s prefix). The L2 host tree is empty → <b style="color:var(--red)">L2 miss</b>, fall through to L3.'],
+    ['③ <b>prefetch_thread</b> 各 rank 向 L3 查命中页数 → 结果可能<strong style="color:var(--amber)">不同</strong>（host 视图/内存差异）：4 / 3 / 4。',
+     '③ <b>prefetch_thread</b> on each rank queries L3 hit pages → results may <strong style="color:var(--amber)">differ</strong> (host view / memory): 4 / 3 / 4.'],
+    ['◆ <b>第一个 MIN</b> @ <code>prefetch_hits_sync_groups</code>（组1，gloo/CPU，含 TP环+PP环）：<code>all_reduce(MIN)</code> 统一查询长度 = <b>3</b>。',
+     '◆ <b>First MIN</b> @ <code>prefetch_hits_sync_groups</code> (set 1, gloo/CPU, TP+PP rings): <code>all_reduce(MIN)</code> unifies the query length = <b>3</b>.'],
+    ['③ <b>prefetch_io_aux_thread</b> 逐 batch 把 page 从 L3 拉回 L2（<code>_page_transfer</code>），每 batch 产 1 个 PrefetchAck。',
+     '③ <b>prefetch_io_aux_thread</b> pulls pages L3→L2 batch by batch (<code>_page_transfer</code>), one PrefetchAck per batch.'],
+    ['③ 逐页加载<strong style="color:var(--amber)">部分失败</strong>：rank2 第 3 页 <code>page_get</code> 未成功 → completed_tokens = 3 / 3 / 2。',
+     '③ Per-page load <strong style="color:var(--amber)">partially fails</strong>: rank2\u2019s 3rd page <code>page_get</code> fails → completed_tokens = 3 / 3 / 2.'],
+    ['◆ <b>第二个 MIN</b> @ <code>prefetch_completion_sync_groups</code>（组2，<b>独立 communicator</b>）：<code>all_reduce(MIN)</code> 统一 completed_tokens = <b>2</b>。',
+     '◆ <b>Second MIN</b> @ <code>prefetch_completion_sync_groups</code> (set 2, <b>independent communicator</b>): <code>all_reduce(MIN)</code> unifies completed_tokens = <b>2</b>.'],
+    ['③ 各 rank 只把统一的 <b>2 个 page</b> 插入 L2 host tree（<code>_insert_helper_host</code>）→ 3 棵 host tree <strong style="color:var(--green)">再次完全一致</strong>。',
+     '③ Each rank inserts only the unified <b>2 pages</b> into its L2 host tree (<code>_insert_helper_host</code>) → all 3 host trees are <strong style="color:var(--green)">identical again</strong>.'],
+    ['✅ 两套<strong>独立 gloo 组</strong>（组1 命中数、组2 完成数）+ 每 batch 恒 1 个 ack → 各 rank 对 host tree 的<strong>插入/删除完全一致</strong> → <b style="color:var(--green)">host radix tree 始终一致，后台 collective 不会死锁</b>。',
+     '✅ Two <strong>independent gloo group-sets</strong> (set 1 hit count, set 2 completed tokens) + exactly one ack per batch → every rank\u2019s <strong>inserts/deletes are identical</strong> → <b style="color:var(--green)">host radix trees stay consistent and background collectives never deadlock</b>.'],
+    // buttons
+    ['⏸ 暂停','⏸ Pause'],['▶ 播放','▶ Play'],['⟲ 重播','⟲ Replay'],['重置','Reset'],
+    ['▶ 1 套组（死锁）','▶ 1 group set (deadlock)'],['▶ 2 套组（安全）','▶ 2 group sets (safe)'],
+    // tab1 legend + tree title + init caption
+    ['<span class="sw" style="background:var(--amber)"></span>命中数被截断（不一致）','<span class="sw" style="background:var(--amber)"></span>hit count truncated (inconsistent)'],
+    ['<span class="sw" style="background:var(--blue)"></span>TP 组内 MIN 后','<span class="sw" style="background:var(--blue)"></span>after MIN within TP group'],
+    ['<span class="sw" style="background:var(--green)"></span>PP 组内 MIN 后（全局一致）','<span class="sw" style="background:var(--green)"></span>after MIN within PP group (global)'],
+    ['所有 24 个 rank 共享同一棵 radix tree','all 24 ranks share one radix tree'],
+    ['自动播放中…','auto-playing…'],
+    // tab1 captions
+    ['拓扑 <b>PP=3 × TP=8 = 24 个 rank</b>：每个 PP stage 下挂 8 个 TP rank。',
+     'Topology <b>PP=3 × TP=8 = 24 ranks</b>: each PP stage holds 8 TP ranks.'],
+    ['① 各 rank <span class="k">独立</span>向 L3 查询前缀命中。<b style="color:var(--amber)">注意 r10、r15 因 host 内存压力被截断</b>（6 / 7 页）。',
+     '① Each rank <span class="k">independently</span> queries L3 for prefix hits. <b style="color:var(--amber)">Note r10 & r15 are truncated by host-memory pressure</b> (6 / 7 pages).'],
+    ['② 若各 rank 按自己的命中数建 radix tree → 树高不一致 → 后续 PP 集合通信 <b style="color:var(--red)">shape mismatch → crash</b>。',
+     '② If each rank builds its radix tree from its own hit count → tree heights differ → next PP collective <b style="color:var(--red)">shape mismatch → crash</b>.'],
+    ['③ 第一步：在 <span class="k">TP 组（每一行 8 个 rank）</span>内 <code>all_reduce(MIN)</code>。',
+     '③ Step 1: <code>all_reduce(MIN)</code> within each <span class="k">TP group (a row of 8 ranks)</span>.'],
+    ['③ TP 组归约后：<b>每一行变得一致</b>（PP0=8, PP1=6, PP2=8 = 各行最小值）。',
+     '③ After TP reduce: <b>each row is uniform</b> (PP0=8, PP1=6, PP2=8 = per-row min).'],
+    ['④ 第二步：在 <span class="k">PP 组（每一列 3 个 rank）</span>内 <code>all_reduce(MIN)</code> → 收敛到全局最小值。',
+     '④ Step 2: <code>all_reduce(MIN)</code> within each <span class="k">PP group (a column of 3 ranks)</span> → converge to the global minimum.'],
+    ['④ PP 组归约后：<b style="color:var(--green)">全部 24 个 rank 命中数 = 6</b>（最长公共前缀）。',
+     '④ After PP reduce: <b style="color:var(--green)">all 24 ranks hit = 6</b> (longest common prefix).'],
+    ['⑤ 所有 rank 都只 prefetch / 建树到 6 → <span style="color:var(--green)">24 个 rank 的 radix tree 完全一致 ✓</span>',
+     '⑤ Every rank prefetches / builds the tree only up to 6 → <span style="color:var(--green)">all 24 radix trees are identical ✓</span>'],
+    // tab2 legend + note + groups + init + captions + banners
+    ['<span class="sw" style="background:var(--purple)"></span><b>prefetch_thread</b>（独立后台线程）· reduce(storage_hit_count)','<span class="sw" style="background:var(--purple)"></span><b>prefetch_thread</b> (independent background thread) · reduce(storage_hit_count)'],
+    ['<span class="sw" style="background:var(--cyan)"></span><b>prefetch_sync_thread</b>（独立后台线程）· reduce(completed_tokens)','<span class="sw" style="background:var(--cyan)"></span><b>prefetch_sync_thread</b> (independent background thread) · reduce(completed_tokens)'],
+    ['每个 cell = 1 个 rank，内含 2 个独立后台线程（小圆点 ●A ●B）。每一行是一个 <b>TP communicator</b>，每一列是一个 <b>PP communicator</b>。',
+     'Each cell = 1 rank, holding 2 independent background threads (dots ●A ●B). Each row is a <b>TP communicator</b>, each column a <b>PP communicator</b>.'],
+    ['<b>prefetch_hits_sync_groups</b><br>命中页数归约组（含 TP 环 + PP 环）<br><span style="font-size:11px">reduce(storage_hit_count)</span>',
+     '<b>prefetch_hits_sync_groups</b><br>hit-count reduce set (TP rings + PP rings)<br><span style="font-size:11px">reduce(storage_hit_count)</span>'],
+    ['<b>prefetch_completion_sync_groups</b><br>完成 token 归约组（含 TP 环 + PP 环）<br><span style="font-size:11px">reduce(completed_tokens)</span>',
+     '<b>prefetch_completion_sync_groups</b><br>completed-token reduce set (TP rings + PP rings)<br><span style="font-size:11px">reduce(completed_tokens)</span>'],
+    ['选择场景：用 <b>1 套组</b> 会死锁，用 <b>2 套组</b> 则安全。','Pick a scenario: <b>1 group set</b> deadlocks, <b>2 group sets</b> are safe.'],
+    ['只有 <b>1 套组</b>：prefetch_thread(A) 与 prefetch_sync_thread(B) 共用同一个 communicator 集。',
+     'Only <b>1 group set</b>: prefetch_thread(A) and prefetch_sync_thread(B) share the same communicator set.'],
+    ['两个后台线程<b>独立调度、顺序不定</b>：同一个 TP 环里，有的 rank 先发 A，有的先发 B。',
+     'The two background threads are <b>scheduled independently, order unpredictable</b>: within one TP ring some ranks post A first, others post B first.'],
+    ['同一个 communicator 上各 rank 提交的 collective <b style="color:var(--red)">不是同一个</b>（A 与 B 错位）→ rendezvous 永远配不上。',
+     'On the same communicator the collectives submitted by different ranks are <b style="color:var(--red)">not the same</b> (A vs B misaligned) → rendezvous never matches.'],
+    ['只要任一 communicator 上 A/B 交错，该环就死锁 → 全局 PP/TP 通信连环卡住。',
+     'If A/B interleave on any communicator, that ring deadlocks → all PP/TP communication hangs in a chain.'],
+    ['💥 DEADLOCK — 整个 24-rank job 卡死','💥 DEADLOCK — the whole 24-rank job hangs'],
+    ['用 <b>2 套独立组</b>：<b style="color:var(--purple)">A 永远走 prefetch_hits_sync_groups</b>，<b style="color:var(--cyan)">B 永远走 prefetch_completion_sync_groups</b>。',
+     'With <b>2 independent group sets</b>: <b style="color:var(--purple)">A always uses prefetch_hits_sync_groups</b>, <b style="color:var(--cyan)">B always uses prefetch_completion_sync_groups</b>.'],
+    ['第一波：所有 rank 的 <b>prefetch_thread</b> 只在 <code>prefetch_hits_sync_groups</code> 上提交 A → 序列一致。',
+     'Wave 1: every rank\u2019s <b>prefetch_thread</b> posts A only on <code>prefetch_hits_sync_groups</code> → consistent order.'],
+    ['✓ TP 环 + PP 环上 A 全部到齐 → 第一波归约完成。','✓ A arrives on every TP ring + PP ring → wave 1 reduce done.'],
+    ['第二波：所有 rank 的 <b>prefetch_sync_thread</b> 只在 <code>prefetch_completion_sync_groups</code> 上提交 B → 序列一致。',
+     'Wave 2: every rank\u2019s <b>prefetch_sync_thread</b> posts B only on <code>prefetch_completion_sync_groups</code> → consistent order.'],
+    ['每个 communicator 上的 collective 序列在所有 rank <b style="color:var(--green)">完全一致</b>（A→组1、B→组2 不交叉）→ 不会死锁。',
+     'The collective sequence on each communicator is <b style="color:var(--green)">identical across ranks</b> (A→set1, B→set2, never crossing) → no deadlock.'],
+    ['✅ 安全 — 24 个 rank 全部对齐完成','✅ Safe — all 24 ranks aligned and complete'],
+    // tab3 titles / conduits / flow-note / lane hint / captions / formers
+    ['③ 主 PP 流水线执行<strong>时序</strong> <span class="tag gpu">NCCL · GPU</span> <span style="color:var(--muted);font-size:11px;">时序连续、错峰流动，<strong style="color:var(--green)">不被后台 prefetch 同步打断</strong></span>',
+     '③ Main PP pipeline execution <strong>timing</strong> <span class="tag gpu">NCCL · GPU</span> <span style="color:var(--muted);font-size:11px;">continuous, staggered flow, <strong style="color:var(--green)">never interrupted by background prefetch sync</strong></span>'],
+    ['↑ 流水线跑的正是②组好的 <strong>mb0→mb3</strong>，沿 stage0→1→2 错峰对角推进',
+     '↑ The pipeline runs exactly the <strong>mb0→mb3</strong> composed in ②, advancing diagonally stage0→1→2'],
+    ['▲ 组好的 <b>batch &amp; micro-batch 顺序</b> 喂给流水线（内容）','▲ The composed <b>batch &amp; micro-batch order</b> feeds the pipeline (content)'],
+    ['② 三个 PP rank 用<strong>同一个 storage hit</strong> 组 batch（内容必须逐 rank 一致）',
+     '② The three PP ranks compose the batch from <strong>the same storage hit</strong> (content must match per rank)'],
+    ['▲ <code>all_reduce(MIN)</code> 输出统一值 <b>6</b> → 决定 batch size','▲ <code>all_reduce(MIN)</code> outputs the unified value <b>6</b> → determines batch size'],
+    ['① 异步 prefetch 查询 → <code>all_reduce(MIN)</code> <span class="tag cpu">gloo · CPU 后台线程</span>',
+     '① Async prefetch query → <code>all_reduce(MIN)</code> <span class="tag cpu">gloo · CPU background thread</span>'],
+    ['（等待②组好的 batch…）','(waiting for batch from ②…)'],
+    ['① 三个 PP rank 的 prefetch 查询<strong>异步发起</strong>（到达时刻不同）。','① The three PP ranks issue prefetch queries <strong>asynchronously</strong> (different arrival times).'],
+    ['① 先到的 rank 在 <span class="k">gloo CPU 后台线程</span>上<b style="color:var(--amber)">等待对齐</b>（不占 GPU）。',
+     '① Earlier ranks <b style="color:var(--amber)">wait to align</b> on the <span class="k">gloo CPU background thread</span> (no GPU use).'],
+    ['① <b style="color:var(--amber)">pp2 最慢</b>到达 → <code>all_reduce(MIN)</code> 把 8/6/7 <strong style="color:var(--green)">统一成 6</strong>。',
+     '① <b style="color:var(--amber)">pp2 is slowest</b> to arrive → <code>all_reduce(MIN)</code> unifies 8/6/7 <strong style="color:var(--green)">into 6</strong>.'],
+    ['② 统一后的 <b>storage hit = 6</b> 下发给各 rank 调度器 → 决定<strong>已缓存前缀长度 / batch size / micro-batch 顺序</strong>（mb0→mb3）。',
+     '② The unified <b>storage hit = 6</b> goes to each rank\u2019s scheduler → determines <strong>cached prefix length / batch size / micro-batch order</strong> (mb0→mb3).'],
+    ['③ 三个 rank 因拿到<strong style="color:var(--green)">同一个 6</strong> 而组出<strong style="color:var(--green)">完全一致的 batch 与 mb 顺序</strong>，喂给 PP 流水线；执行时序连续不被打断。<br><span style="color:var(--red)">⚠ 若 storage hit 不统一 → batch/mb 顺序逐 rank 发散 → PP 调度错位、卡死。</span>',
+     '③ Because all three ranks get <strong style="color:var(--green)">the same 6</strong>, they compose <strong style="color:var(--green)">identical batches and mb order</strong> fed to the PP pipeline; timing stays continuous.<br><span style="color:var(--red)">⚠ If storage hit weren\u2019t unified → batch/mb order diverges per rank → PP scheduling mismatch & hang.</span>'],
+    // formers
+    ['调度器 · PP rank 0','Scheduler · PP rank 0'],['调度器 · PP rank 1','Scheduler · PP rank 1'],['调度器 · PP rank 2','Scheduler · PP rank 2'],
+    ['已缓存前缀 storage hit = ','cached prefix storage hit = '],
+    [' 页 → 决定 batch 组成',' pages → determines batch'],
+    ['✓ batch & mb 顺序一致','✓ identical batch & mb order'],
+    // mesh labels
+    ['<b>PP stage 0</b><br>(TP 组)','<b>PP stage 0</b><br>(TP group)'],
+    ['<b>PP stage 1</b><br>(TP 组)','<b>PP stage 1</b><br>(TP group)'],
+    ['<b>PP stage 2</b><br>(TP 组)','<b>PP stage 2</b><br>(TP group)'],
+    ['PP 组(列)<br>每列跨 3 个 stage →','PP groups (cols)<br>each spans 3 stages →'],
+    // tab4 flow boxes
+    ['调度器 Scheduler <span class="pin">主线程</span>','Scheduler <span class="pin">main thread</span>'],
+    ['发起 prefetch 请求（writeback / load）','Issues prefetch requests (writeback / load)'],
+    ['▼ <b>prefetch_queue</b>（PrefetchOperation）','▼ <b>prefetch_queue</b> (PrefetchOperation)'],
+    ['① prefetch_thread <span class="pin">storage-hit 线程</span>','① prefetch_thread <span class="pin">storage-hit thread</span>'],
+    ['<code>_storage_hit_query()</code> 查询 L3 命中页数；命中足够→放 prefetch_buffer，不足→prefetch_revoke_queue',
+     '<code>_storage_hit_query()</code> queries L3 hit pages; enough hits → prefetch_buffer, too few → prefetch_revoke_queue'],
+    ['◆ all_reduce(MIN) storage_hit_count <small>@ prefetch_hits_sync_groups（组1，gloo/CPU，含 TP 环 + PP 环）</small>',
+     '◆ all_reduce(MIN) storage_hit_count <small>@ prefetch_hits_sync_groups (set 1, gloo/CPU, TP rings + PP rings)</small>'],
+    ['▼ <b>prefetch_buffer</b>','▼ <b>prefetch_buffer</b>'],
+    ['② prefetch_io_aux_thread <span class="pin">IO 加载线程</span>','② prefetch_io_aux_thread <span class="pin">IO load thread</span>'],
+    ['<code>_page_transfer()</code> 逐 batch 把页从 L3 读入 host；累加 <b>completed_tokens</b>；<b>每个 storage batch 产生 1 个 PrefetchAck</b>（出错也照常产生）',
+     '<code>_page_transfer()</code> loads pages L3→host batch by batch; accumulates <b>completed_tokens</b>; <b>each storage batch emits exactly 1 PrefetchAck</b> (even on error)'],
+    ['▼ <b>prefetch_sync_queue</b>（PrefetchAck）','▼ <b>prefetch_sync_queue</b> (PrefetchAck)'],
+    ['③ prefetch_sync_thread <span class="pin">completion-token 线程</span>','③ prefetch_sync_thread <span class="pin">completion-token thread</span>'],
+    ['对每个 ack 的 <b>completed_tokens</b> 做归约','Reduces <b>completed_tokens</b> of every ack'],
+    ['◆ all_reduce(MIN) completed_tokens <small>@ prefetch_completion_sync_groups（组2，gloo/CPU，含 TP 环 + PP 环）</small>',
+     '◆ all_reduce(MIN) completed_tokens <small>@ prefetch_completion_sync_groups (set 2, gloo/CPU, TP rings + PP rings)</small>'],
+    ['▼ <b>ack_prefetch_queue</b>','▼ <b>ack_prefetch_queue</b>'],
+    ['调度器写入 host radix tree','Scheduler inserts into host radix tree'],
+    ['只插入 <b>completed_tokens</b> 长度的前缀 → <code>_insert_helper_host()</code>','Inserts only the <b>completed_tokens</b>-long prefix → <code>_insert_helper_host()</code>'],
+    ['为什么 MIN(storage_hit) 一致？','Why does MIN(storage_hit) ensure consistency?'],
+    ['各 rank 命中可能不同（host 内存截断、L3 视图差异）。MIN 取<b>最长公共可命中前缀</b> → 所有 rank <b>抓取范围一致</b>，不会各抓不同长度。',
+     'Hits may differ per rank (host-mem truncation, L3 view differences). MIN takes the <b>longest common hittable prefix</b> → every rank <b>fetches the same range</b>, never different lengths.'],
+    ['为什么 MIN(completed_tokens) 一致？','Why does MIN(completed_tokens) ensure consistency?'],
+    ['即便抓取范围一致，实际逐页加载仍可能<b>部分失败</b>（<code>page_get</code> 返回 n≠batch）。MIN 只提交<b>所有 rank 都成功落盘的最长公共前缀</b> → 写入 host tree 的长度逐 rank 相同。',
+     'Even with the same fetch range, per-page loads can <b>partially fail</b> (<code>page_get</code> returns n≠batch). MIN commits only the <b>longest common prefix every rank loaded successfully</b> → identical insert length per rank.'],
+    ['为什么不会 hang？','Why no hang?'],
+    ['每个 storage batch <b>都产生且仅产生一个 PrefetchAck</b>（即使出错也照常产生）→ 每个 rank 参与的 reduce <b>次数严格相等</b>，collective 一一对齐。两个 MIN 一起保证：<b>插入 host tree 的前缀逐 rank 完全相同 → 树一致</b>。',
+     'Each storage batch <b>emits exactly one PrefetchAck</b> (even on error) → every rank joins the <b>same number of reduces</b>, collectives align one-to-one. The two MINs together guarantee: <b>the prefix inserted into the host tree is identical per rank → trees are consistent</b>.'],
+    ['两个 MIN 同步点（组1 命中数、组2 完成数）+ 每 batch 恒定 1 个 ack，共同保证 PP 各 rank 的 host radix tree 严格一致。',
+     'Two MIN sync points (set 1 = hit count, set 2 = completed tokens) + exactly one ack per batch together keep every PP rank\u2019s host radix tree strictly identical.'],
+    // tab4 animated step captions
+    ['沿数据流向下逐步点亮：两个 MIN 同步点 + 每 batch 恒定 1 个 ack。',
+     'Light up step by step along the data flow: two MIN sync points + exactly one ack per batch.'],
+    ['调度器主线程把 prefetch 请求（writeback / load）放入队列，触发后台流水线。',
+     'The scheduler main thread enqueues a prefetch request (writeback / load), kicking off the background pipeline.'],
+    ['<b>PrefetchOperation</b> 入队 <code>prefetch_queue</code>，交给后台线程处理。',
+     'A <b>PrefetchOperation</b> enters <code>prefetch_queue</code>, handed to the background threads.'],
+    ['① <b>prefetch_thread</b> 调 <code>_storage_hit_query()</code> 查询 L3 命中页数（各 rank 可能不同）。',
+     '① <b>prefetch_thread</b> calls <code>_storage_hit_query()</code> to query L3 hit pages (may differ per rank).'],
+    ['◆ <b style="color:var(--amber)">第一个 MIN</b>：在 <code>prefetch_hits_sync_groups</code>（组1）对 storage_hit_count 取最小 → <b>抓取范围逐 rank 一致</b>。',
+     '◆ <b style="color:var(--amber)">First MIN</b>: take the min of storage_hit_count on <code>prefetch_hits_sync_groups</code> (set 1) → <b>the fetch range is identical per rank</b>.'],
+    ['命中足够的请求落入 <code>prefetch_buffer</code>，进入实际 IO 加载。',
+     'Requests with enough hits drop into <code>prefetch_buffer</code> for the actual IO load.'],
+    ['② <b>prefetch_io_aux_thread</b> 用 <code>_page_transfer()</code> 逐 batch 把页 L3→host；<b>每个 batch 恒产生 1 个 PrefetchAck</b>（出错也产生）。',
+     '② <b>prefetch_io_aux_thread</b> uses <code>_page_transfer()</code> to move pages L3→host batch by batch; <b>each batch always emits exactly one PrefetchAck</b> (even on error).'],
+    ['每个 batch 的 <b>PrefetchAck</b> 入队 <code>prefetch_sync_queue</code>。',
+     'Each batch\u2019s <b>PrefetchAck</b> enters <code>prefetch_sync_queue</code>.'],
+    ['③ <b>prefetch_sync_thread</b> 对每个 ack 的 <b>completed_tokens</b> 做归约。',
+     '③ <b>prefetch_sync_thread</b> reduces the <b>completed_tokens</b> of every ack.'],
+    ['◆ <b style="color:var(--green)">第二个 MIN</b>：在 <code>prefetch_completion_sync_groups</code>（组2）对 completed_tokens 取最小 → <b>真正落盘前缀逐 rank 一致</b>。',
+     '◆ <b style="color:var(--green)">Second MIN</b>: take the min of completed_tokens on <code>prefetch_completion_sync_groups</code> (set 2) → <b>the actually-loaded prefix is identical per rank</b>.'],
+    ['统一后的结果入队 <code>ack_prefetch_queue</code> 回到调度器。',
+     'The unified result enters <code>ack_prefetch_queue</code> back to the scheduler.'],
+    ['调度器只插入 <b>completed_tokens</b> 长度的前缀 → <code>_insert_helper_host()</code>。每 batch 恒 1 个 ack，<b>reduce 次数严格相等 → 不会 hang</b>。',
+     'The scheduler inserts only the <b>completed_tokens</b>-long prefix → <code>_insert_helper_host()</code>. One ack per batch means <b>equal reduce counts → no hang</b>.'],
+    ['✅ 闭环：两个 MIN（组1 命中数 + 组2 完成数）+ 每 batch 1 个 ack → <b style="color:var(--green)">PP 各 rank 的 host radix tree 严格一致</b>。',
+     '✅ Closed loop: two MINs (set 1 hit count + set 2 completed tokens) + one ack per batch → <b style="color:var(--green)">every PP rank\u2019s host radix tree is strictly identical</b>.'],
+  ];
+
+  const SEL = ['header h1','header p','.tab','.tree-title','.caption','.banner','.legend .chip',
+    '#scene2 .note','.grp','.t3-title','.clabel','.flow-note','.lane-hint','.ctl',
+    '.pp-label','.pp-foot .lab','.former h5','.former .hitbox .ht1','.former .hitbox .ht2','.former .chk',
+    '.tbox .tname','.tbox .tdesc','.tarrow','.minnode','.whycard h4','.whycard p',
+    '.gpu-badge span','.l3lab','.syncbadge','.consist-flag','#scene6 .note','.barlabel'].join(',');
+
+  const tmp=document.createElement('div');
+  const strip=h=>{ tmp.innerHTML=h; return tmp.textContent.replace(/\s+/g,' ').trim(); };
+  const EN={}, ZH={};
+  PAIRS.forEach(([zh,en])=>{ EN[strip(zh)]=en; ZH[strip(en)]=zh; });
+
+  let LANG='zh';
+  // runtime helper for dynamic strings that contain interpolated values
+  // (cannot be matched by the static dictionary). Reads the live LANG.
+  window.TR=(zh,en)=> LANG==='en' ? en : zh;
+  let mo=null;
+  let suppress=false;   // re-entrancy guard: ignore mutations we cause ourselves
+  function translateEl(el){
+    const k=strip(el.innerHTML);
+    const next = LANG==='en' ? EN[k] : ZH[k];
+    // only write when there is a real change, otherwise we churn the DOM
+    if(next!==undefined && next!==el.innerHTML) el.innerHTML=next;
+  }
+  function translateAll(){
+    suppress=true;
+    document.querySelectorAll(SEL).forEach(translateEl);
+    if(mo) mo.takeRecords();   // drop the records our own writes just generated
+    suppress=false;
+  }
+
+  window.toggleLang=function(){
+    LANG = LANG==='zh' ? 'en' : 'zh';
+    document.getElementById('langBtn').textContent = LANG==='zh' ? 'EN' : '中文';
+    document.documentElement.lang = LANG==='zh' ? 'zh-CN' : 'en';
+    translateAll();
+  };
+
+  // keep dynamic captions translated as JS rewrites them
+  mo=new MutationObserver(muts=>{
+    if(suppress) return;       // skip the mutations our own translations produced
+    suppress=true;
+    muts.forEach(m=>{
+      const tgt = m.target.nodeType===1 ? m.target : m.target.parentElement;
+      if(!tgt) return;
+      const c = tgt.closest && tgt.closest(SEL);
+      if(c) translateEl(c);
+    });
+    mo.takeRecords();
+    suppress=false;
+  });
+  mo.observe(document.body,{subtree:true,childList:true,characterData:true});
+})();
+</script>
+</body>
+</html>
+<style>#langBtn,header,.tabs{display:none!important;}body{background:#0e1117;}.wrap{padding-top:10px;}</style>
+<script>
+(function(){
+  // English-only, single-tab embed: reuse all original JS
+  try{ if(window.toggleLang) toggleLang(); }catch(e){}   // zh -> en
+  var TAB="deadlock";
+  var btn=document.querySelector('.tab[data-tab="'+TAB+'"]');
+  if(btn){ btn.click(); }
+})();
+</script>
diff --git a/public/images/blog/pp_hicache_consistency/hicache_pp_animation_en_lifecycle.html b/public/images/blog/pp_hicache_consistency/hicache_pp_animation_en_lifecycle.html
new file mode 100644
index 000000000..648f386ce
--- /dev/null
+++ b/public/images/blog/pp_hicache_consistency/hicache_pp_animation_en_lifecycle.html
@@ -0,0 +1,1559 @@
+<!DOCTYPE html>
+<html lang="zh-CN">
+<head>
+<meta charset="UTF-8" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<title>HiCache × PP=3 · TP=8：树一致性 & 防死锁 动画</title>
+<style>
+  :root{
+    --bg:#0e1117; --panel:#161b22; --panel2:#1c2330; --line:#30363d;
+    --text:#e6edf3; --muted:#8b949e;
+    --blue:#58a6ff; --green:#3fb950; --red:#f85149; --amber:#d29922;
+    --purple:#bc8cff; --cyan:#56d4dd;
+  }
+  *{box-sizing:border-box;}
+  body{
+    margin:0; background:radial-gradient(1200px 600px at 50% -10%, #18202c, var(--bg));
+    color:var(--text); font-family:-apple-system,BlinkMacSystemFont,"Segoe UI","PingFang SC","Microsoft YaHei",sans-serif;
+    line-height:1.5;
+  }
+  #langBtn{ position:fixed; top:14px; right:16px; z-index:50; background:var(--panel2); border:1px solid var(--blue);
+    color:var(--blue); padding:7px 14px; border-radius:999px; cursor:pointer; font-size:13px; font-weight:600; }
+  #langBtn:hover{ background:var(--blue); color:#04101f; }
+  header{ text-align:center; padding:20px 16px 4px; }
+  header h1{ margin:0 0 4px; font-size:21px; }
+  header p{ margin:0; color:var(--muted); font-size:13px; }
+  .tabs{ display:flex; gap:8px; justify-content:center; margin:16px auto 8px; flex-wrap:wrap; }
+  .tab{ background:var(--panel); border:1px solid var(--line); color:var(--text);
+    padding:9px 16px; border-radius:999px; cursor:pointer; font-size:14px; transition:all .15s; }
+  .tab.active{ background:var(--blue); color:#04101f; border-color:var(--blue); font-weight:600; }
+  .wrap{ max-width:1120px; margin:0 auto; padding:0 16px 60px; }
+  .scene{ background:var(--panel); border:1px solid var(--line); border-radius:14px; padding:18px; position:relative; }
+  .hidden{ display:none; }
+  .controls{ display:flex; gap:10px; justify-content:center; align-items:center; margin:14px 0 4px; flex-wrap:wrap; }
+  button.ctl{ background:var(--panel2); border:1px solid var(--line); color:var(--text);
+    padding:8px 16px; border-radius:8px; cursor:pointer; font-size:14px; }
+  button.ctl:hover{ border-color:var(--blue); }
+  button.ctl.primary{ background:var(--green); color:#04140a; border-color:var(--green); font-weight:600; }
+  button.ctl.alt{ background:var(--red); color:#1a0606; border-color:var(--red); font-weight:600; }
+  .caption{ text-align:center; min-height:44px; margin:10px auto 0; max-width:900px; font-size:15px; }
+  .caption .k{ color:var(--cyan); font-weight:600; }
+  code{ background:#0d1117; padding:1px 6px; border-radius:4px; border:1px solid var(--line); color:var(--cyan); font-size:12px; }
+  .legend{ display:flex; gap:18px; justify-content:center; font-size:12px; color:var(--muted); flex-wrap:wrap; margin-bottom:10px; }
+  .chip{ display:inline-flex; align-items:center; gap:6px; }
+  .sw{ width:14px; height:14px; border-radius:4px; display:inline-block; }
+
+  /* ---------- mesh ---------- */
+  .mesh-head{ display:flex; align-items:center; gap:8px; margin-left:96px; margin-bottom:4px; }
+  .tp-hdr{ flex:1; display:flex; gap:6px; }
+  .tp-hdr .th{ flex:1; text-align:center; font-size:11px; color:var(--muted); }
+  .pp-row{ display:flex; align-items:center; gap:8px; margin-bottom:6px; }
+  .pp-label{ width:88px; font-size:12px; color:var(--muted); text-align:right; line-height:1.2; }
+  .pp-label b{ color:var(--text); }
+  .row-cells{ flex:1; display:flex; gap:6px; border:2px solid transparent; border-radius:10px; padding:3px; transition:border-color .3s, box-shadow .3s; }
+  .row-cells.ring-a{ border-color:var(--purple); box-shadow:0 0 12px rgba(188,140,255,.25); }
+  .row-cells.ring-b{ border-color:var(--cyan); box-shadow:0 0 12px rgba(86,212,221,.25); }
+  .row-cells.ring-bad{ border-color:var(--red); box-shadow:0 0 12px rgba(248,81,73,.3); }
+  .cell{
+    flex:1; height:54px; border-radius:8px; background:#0d1117; border:1px solid var(--line);
+    display:flex; flex-direction:column; align-items:center; justify-content:center; gap:1px;
+    transition:background .3s, border-color .3s, transform .15s, box-shadow .15s; position:relative;
+  }
+  .cell .v{ font-size:18px; font-weight:800; color:var(--muted); transition:color .3s; }
+  .cell .v small{ font-size:9px; font-weight:500; }
+  .cell .rk{ font-size:9px; color:#5b6470; }
+  .cell.varied .v{ color:var(--amber); }
+  .cell.tpmin{ background:linear-gradient(180deg,#143055,#102844); border-color:var(--blue); }
+  .cell.tpmin .v{ color:#cfe6ff; }
+  .cell.gmin{ background:linear-gradient(180deg,#0f3a1d,#0e2c18); border-color:var(--green); }
+  .cell.gmin .v{ color:#c4f7d4; }
+  .cell.sweep{ transform:translateY(-3px); box-shadow:0 0 14px rgba(88,166,255,.55); border-color:var(--blue); }
+  .cell.bad{ background:linear-gradient(180deg,#3a1414,#2a1010); border-color:var(--red); }
+  .cell.bad .v{ color:#ffd4d0; }
+  .cell.dim{ opacity:.35; }
+  /* thread dots */
+  .tdots{ display:flex; gap:5px; margin-top:1px; }
+  .td{ width:9px; height:9px; border-radius:50%; border:1px solid var(--line); background:#0d1117; transition:all .25s; }
+  .td.a{ border-color:var(--purple); }
+  .td.b{ border-color:var(--cyan); }
+  .td.a.on{ background:var(--purple); box-shadow:0 0 8px var(--purple); }
+  .td.b.on{ background:var(--cyan); box-shadow:0 0 8px var(--cyan); }
+  .td.done{ background:var(--green); border-color:var(--green); box-shadow:0 0 6px var(--green); }
+  .td.dead{ background:var(--red); border-color:var(--red); box-shadow:0 0 6px var(--red); }
+
+  /* pp-group footer (columns) */
+  .pp-foot{ display:flex; align-items:center; gap:8px; margin-top:6px; }
+  .pp-foot .lab{ width:88px; font-size:11px; color:var(--muted); text-align:right; }
+  .pp-foot .cols{ flex:1; display:flex; gap:6px; padding:0 3px; }
+  .pp-foot .col{ flex:1; height:20px; border-radius:6px; border:1px dashed var(--line); font-size:9px;
+    color:#5b6470; display:flex; align-items:center; justify-content:center; transition:all .3s; }
+  .pp-foot .col.ring-a{ border-color:var(--purple); color:#e3d3ff; }
+  .pp-foot .col.ring-b{ border-color:var(--cyan); color:#cdf6fa; }
+  .pp-foot .col.ring-bad{ border-color:var(--red); color:#ffd4d0; }
+
+  /* shared tree (tab1) */
+  .tree-box{ margin-top:14px; display:flex; flex-direction:column; align-items:center; }
+  .tree-title{ font-size:12px; color:var(--muted); margin-bottom:6px; }
+  .tree{ display:flex; flex-direction:column; align-items:center; gap:4px; min-height:30px; }
+  .tnode{ width:220px; height:20px; border-radius:5px; background:#0d1117; border:1px solid var(--line);
+    display:flex; align-items:center; justify-content:center; font-size:11px; color:var(--muted);
+    opacity:0; transform:translateY(-6px); transition:opacity .25s, transform .25s; }
+  .tnode.show{ opacity:1; transform:none; background:linear-gradient(90deg,#0f3a1d,#196b32); border-color:var(--green); color:#c4f7d4; }
+
+  /* groups panel (tab2) */
+  .groups{ display:flex; gap:16px; justify-content:center; margin-top:12px; flex-wrap:wrap; }
+  .grp{ border:1px dashed var(--line); border-radius:10px; padding:8px 14px; font-size:12px; color:var(--muted);
+    min-width:230px; text-align:center; transition:all .3s; }
+  .grp b{ color:var(--text); }
+  .grp.g1.hot{ border-color:var(--purple); color:#e3d3ff; box-shadow:0 0 14px rgba(188,140,255,.25); }
+  .grp.g2.hot{ border-color:var(--cyan); color:#cdf6fa; box-shadow:0 0 14px rgba(86,212,221,.25); }
+  .banner{ text-align:center; font-weight:800; font-size:18px; min-height:24px; margin-top:8px; }
+  .banner.ok{ color:var(--green); } .banner.bad{ color:var(--red); }
+  .note{ font-size:12px; color:var(--muted); text-align:center; margin-top:4px; }
+
+  /* ---------- tab3: async skew x MIN ---------- */
+  .t3-section{ background:var(--panel2); border:1px solid var(--line); border-radius:12px; padding:10px 12px; margin-bottom:14px; }
+  .t3-title{ font-size:13px; margin-bottom:8px; display:flex; align-items:center; gap:8px; }
+  .t3-title .tag{ font-size:10px; padding:2px 8px; border-radius:999px; border:1px solid var(--line); color:var(--muted); }
+  .t3-title .tag.gpu{ border-color:var(--blue); color:#cfe6ff; }
+  .t3-title .tag.cpu{ border-color:var(--amber); color:#ffe2ab; }
+  /* top pipeline */
+  .pipe{ position:relative; }
+  .lane{ position:relative; height:34px; margin:6px 0; border-radius:8px; background:#0d1117;
+    border:1px solid var(--line); overflow:hidden; }
+  .lane .lname{ position:absolute; left:8px; top:50%; transform:translateY(-50%); font-size:11px; color:var(--muted); z-index:3; }
+  .mb{ position:absolute; top:6px; height:22px; width:52px; border-radius:6px; z-index:2;
+    display:flex; align-items:center; justify-content:center; font-size:10px; font-weight:700;
+    opacity:0; color:#c4f7d4; border:1px solid var(--green);
+    background:linear-gradient(180deg,#0f3a1d,#15311f); transition:opacity .18s; }
+  .lane-hint{ position:absolute; right:10px; top:50%; transform:translateY(-50%); font-size:10px; color:#46505e; z-index:1; }
+  .flow-note{ font-size:11px; color:var(--green); text-align:right; margin-top:2px; }
+
+  /* bottom sync area */
+  .sync-area{ position:relative; height:170px; border-radius:8px; background:#0d1117; border:1px solid var(--line); overflow:hidden; }
+  .barrier{ position:absolute; top:6px; bottom:6px; width:0; border-left:2px dashed var(--amber); z-index:2; }
+  .barrier .blabel{ position:absolute; top:-2px; left:8px; font-size:11px; color:var(--amber); white-space:nowrap; }
+  .barrier.fire{ border-left-color:var(--green); box-shadow:-2px 0 18px rgba(63,185,80,.5); }
+  .slane{ position:absolute; left:0; right:0; height:1px; border-top:1px dashed #1f2733; }
+  .slabel{ position:absolute; left:8px; font-size:11px; color:var(--muted); z-index:3; transform:translateY(-50%); }
+  .pkt{ position:absolute; height:30px; width:108px; border-radius:8px; z-index:3; transform:translateY(-50%);
+    display:flex; align-items:center; justify-content:center; gap:6px; font-size:11px; font-weight:700;
+    border:1px solid var(--line); background:#11161f; transition:background .25s, border-color .25s, box-shadow .25s; }
+  .pkt .hv{ font-size:14px; }
+  .pkt.travel{ border-color:var(--blue); color:#cfe6ff; }
+  .pkt.wait{ border-color:var(--amber); color:#ffe2ab; background:#1d1808; animation:pulse 1s infinite; }
+  .pkt.unified{ border-color:var(--green); color:#c4f7d4; background:linear-gradient(180deg,#0f3a1d,#0e2c18); box-shadow:0 0 12px rgba(63,185,80,.35); }
+  .clock{ position:absolute; right:10px; top:8px; font-size:11px; color:var(--muted); z-index:4; }
+  /* causal arrows + batch formers */
+  .t3-conduit{ position:relative; height:38px; margin:-2px 0 8px; display:flex; align-items:center; justify-content:center; }
+  .t3-conduit .clabel{ font-size:12px; color:var(--muted); transition:color .3s; z-index:2; background:var(--panel); padding:0 8px; }
+  .t3-conduit.hot .clabel{ color:var(--green); }
+  .t3-conduit::before{ content:""; position:absolute; left:50%; top:4px; bottom:4px; width:2px; transform:translateX(-50%);
+    background:repeating-linear-gradient(to top, #2b3340 0 6px, transparent 6px 12px); }
+  .t3-conduit.hot::before{ background:repeating-linear-gradient(to top, var(--green) 0 6px, transparent 6px 12px); opacity:.5; }
+  .t3-conduit .spark{ position:absolute; left:50%; bottom:3px; width:11px; height:11px; border-radius:50%;
+    background:var(--green); opacity:0; box-shadow:0 0 12px var(--green); transform:translateX(-50%); z-index:3; }
+  .t3-conduit.hot .spark{ animation:rise .85s linear infinite; }
+  .t3-conduit.hot .spark.s2{ animation-delay:.42s; }
+  @keyframes rise{ 0%{opacity:0; transform:translate(-50%,0);} 15%{opacity:1;} 100%{opacity:0; transform:translate(-50%,-34px);} }
+  .pipe.fed .lane{ border-color:var(--green); box-shadow:0 0 10px rgba(63,185,80,.25); }
+  .formers{ display:flex; gap:14px; justify-content:center; flex-wrap:wrap; }
+  .former{ flex:1; min-width:240px; max-width:320px; background:#0d1117; border:1px solid var(--line);
+    border-radius:10px; padding:10px 12px; transition:border-color .3s, box-shadow .3s; }
+  .former h5{ margin:0 0 6px; font-size:13px; }
+  .former .hitbox{ font-size:12px; color:var(--muted); margin-bottom:8px; }
+  .former .hitbox b{ color:var(--amber); font-size:14px; }
+  .former.ready{ border-color:var(--green); box-shadow:0 0 12px rgba(63,185,80,.2); }
+  .former.ready .hitbox b{ color:var(--green); }
+  .mbrow{ display:flex; gap:6px; }
+  .mbchip{ flex:1; height:24px; border-radius:6px; background:#11161f; border:1px solid var(--line);
+    display:flex; align-items:center; justify-content:center; font-size:11px; color:var(--muted); opacity:.3; transition:all .25s; }
+  .mbchip.on{ opacity:1; background:linear-gradient(180deg,#143055,#102844); border-color:var(--blue); color:#cfe6ff; }
+  .mbchip.fixed{ opacity:1; background:linear-gradient(180deg,#0f3a1d,#0e2c18); border-color:var(--green); color:#c4f7d4; }
+  .former .chk{ font-size:12px; color:var(--green); margin-top:6px; min-height:16px; }
+
+  /* ---------- tab4: thread relationships ---------- */
+  .t4wrap{ display:flex; gap:18px; flex-wrap:wrap; }
+  .t4flow{ flex:2; min-width:340px; display:flex; flex-direction:column; align-items:stretch; gap:0; }
+  .t4why{ flex:1; min-width:280px; display:flex; flex-direction:column; gap:10px; }
+  .tbox{ border:1px solid var(--line); border-radius:10px; padding:10px 12px; background:#0d1117; position:relative; }
+  .tbox .tname{ font-size:14px; font-weight:700; display:flex; align-items:center; gap:8px; }
+  .tbox .tdesc{ font-size:11.5px; color:var(--muted); margin-top:3px; }
+  .tbox .pin{ font-size:10px; padding:1px 7px; border-radius:999px; border:1px solid var(--line); color:var(--muted); }
+  .tbox.thread-hit{ border-left:4px solid var(--purple); }
+  .tbox.thread-io{ border-left:4px solid var(--blue); }
+  .tbox.thread-sync{ border-left:4px solid var(--cyan); }
+  .tbox.sched{ border-left:4px solid var(--muted); background:#11161f; }
+  .tarrow{ text-align:center; color:var(--muted); font-size:11px; padding:5px 0; position:relative; }
+  .tarrow b{ color:var(--text); }
+  .minnode{ align-self:center; margin:6px 0; border:1.5px solid var(--amber); border-radius:999px;
+    padding:7px 16px; font-size:12px; color:#ffe2ab; background:#1d1808; font-weight:600; }
+  .minnode.g2{ border-color:var(--green); color:#c4f7d4; background:#0f2a18; }
+  .minnode small{ display:block; font-size:10px; color:var(--muted); font-weight:400; }
+  .whycard{ border:1px solid var(--line); border-radius:10px; padding:11px 13px; background:var(--panel2); }
+  .whycard h4{ margin:0 0 6px; font-size:13px; }
+  .whycard p{ margin:0; font-size:12px; color:var(--muted); }
+  .whycard.a{ border-left:4px solid var(--purple); }
+  .whycard.b{ border-left:4px solid var(--cyan); }
+  .whycard.c{ border-left:4px solid var(--green); }
+  .whycard code{ font-size:11px; }
+
+  /* ---------- tab4 animation states ---------- */
+  #scene4 .tbox,#scene4 .tarrow,#scene4 .minnode,#scene4 .whycard{ transition:all .3s; }
+  #scene4 .dimmed{ opacity:.3; }
+  .tbox.lit{ box-shadow:0 0 18px rgba(88,166,255,.45); transform:translateX(5px); }
+  .tbox.thread-hit.lit{ box-shadow:0 0 18px rgba(188,140,255,.55); }
+  .tbox.thread-io.lit{ box-shadow:0 0 18px rgba(88,166,255,.55); }
+  .tbox.thread-sync.lit{ box-shadow:0 0 18px rgba(86,212,221,.55); }
+  .tbox.sched.lit{ box-shadow:0 0 18px rgba(63,185,80,.4); }
+  .tarrow.lit{ color:var(--green); font-weight:700; }
+  .tarrow.lit b{ color:var(--green); }
+  .tarrow.lit::after{ content:" ●"; color:var(--green); animation:t4blink .7s infinite; }
+  @keyframes t4blink{ 0%,100%{opacity:.2;} 50%{opacity:1;} }
+  .minnode.lit{ box-shadow:0 0 22px rgba(210,153,34,.65); transform:scale(1.06); }
+  .minnode.g2.lit{ box-shadow:0 0 22px rgba(63,185,80,.65); }
+  .whycard.lit{ box-shadow:0 0 18px rgba(63,185,80,.35); transform:translateY(-3px); border-left-width:6px; }
+
+  /* ---------- tab5: two-request full lifecycle story ---------- */
+  #scene5 .story-top{ display:flex; gap:14px; align-items:stretch; margin-bottom:14px; }
+  .gpu-badge{ width:120px; border:1px solid var(--line); border-radius:12px; background:#0d1117;
+    display:flex; flex-direction:column; align-items:center; justify-content:center; gap:2px;
+    font-size:12px; color:var(--muted); transition:all .3s; }
+  .gpu-badge .ic{ font-size:22px; }
+  .gpu-badge.busy{ border-color:var(--blue); color:#cfe6ff; box-shadow:0 0 18px rgba(88,166,255,.45);
+    background:linear-gradient(180deg,#143055,#102844); animation:gpupulse 1s infinite; }
+  @keyframes gpupulse{ 0%,100%{box-shadow:0 0 12px rgba(88,166,255,.3);} 50%{box-shadow:0 0 22px rgba(88,166,255,.6);} }
+  .l3box{ flex:1; border:1px solid var(--line); border-radius:12px; background:#0d1117; padding:8px 12px; transition:all .3s; }
+  .l3box.hot{ border-color:var(--green); box-shadow:0 0 14px rgba(63,185,80,.25); }
+  .l3title{ font-size:12px; color:var(--muted); margin-bottom:6px; display:flex; justify-content:space-between; align-items:center; }
+  .l3title .badge{ font-size:10px; padding:1px 8px; border-radius:999px; border:1px solid var(--line); color:var(--muted); min-width:46px; text-align:center; }
+  .l3title .badge.miss{ border-color:var(--red); color:#ffd4d0; }
+  .l3title .badge.hit{ border-color:var(--green); color:#c4f7d4; }
+  .pagerow{ display:flex; gap:6px; flex-wrap:wrap; min-height:28px; }
+  .pg{ width:38px; height:26px; border-radius:6px; border:1px solid var(--line); background:#11161f;
+    display:flex; align-items:center; justify-content:center; font-size:11px; color:#5b6470;
+    transition:all .3s; opacity:0; transform:scale(.6); }
+  .pg.show{ opacity:1; transform:none; }
+  .pg.l3{ border-color:var(--green); color:#c4f7d4; background:linear-gradient(180deg,#0f3a1d,#0e2c18); }
+
+  #scene5 .ranks{ display:flex; flex-direction:column; gap:10px; }
+  .ranklane{ border:1px solid var(--line); border-radius:12px; padding:8px 12px; background:var(--panel2); transition:all .3s; }
+  .ranklane.active{ border-color:var(--blue); box-shadow:0 0 12px rgba(88,166,255,.22); }
+  .rankhdr{ display:flex; align-items:center; gap:10px; margin-bottom:6px; font-size:12px; }
+  .rankhdr .rname{ font-weight:700; color:var(--text); }
+  .rankhdr .tps{ display:flex; gap:4px; }
+  .rankhdr .tp{ width:8px; height:8px; border-radius:50%; background:#11161f; border:1px solid var(--line); transition:all .25s; }
+  .rankhdr .tp.on{ background:var(--blue); border-color:var(--blue); box-shadow:0 0 6px var(--blue); }
+  .rankhdr .rstat{ margin-left:auto; font-size:11px; padding:1px 9px; border-radius:999px; border:1px solid var(--line); color:var(--muted); }
+  .rankhdr .rstat.miss{ border-color:var(--red); color:#ffd4d0; }
+  .rankhdr .rstat.hit{ border-color:var(--green); color:#c4f7d4; }
+  .rankhdr .rstat.warn{ border-color:var(--amber); color:#ffe2ab; }
+  .htree{ display:flex; align-items:center; gap:8px; min-height:28px; }
+  .htree .root{ font-size:10px; color:var(--muted); padding:2px 7px; border:1px dashed var(--line); border-radius:6px; }
+  .htnode{ width:40px; height:26px; border-radius:6px; border:1px solid var(--line); background:#11161f;
+    display:flex; align-items:center; justify-content:center; font-size:11px; color:#5b6470; position:relative;
+    transition:all .35s; opacity:0; transform:translateY(-6px) scale(.7); }
+  .htnode::before{ content:""; position:absolute; left:-8px; top:50%; width:8px; height:1px; background:var(--line); }
+  .htnode:first-of-type::before{ display:none; }
+  .htnode.show{ opacity:1; transform:none; }
+  .htnode.committed{ border-color:var(--green); color:#c4f7d4; background:linear-gradient(180deg,#0f3a1d,#0e2c18); }
+  .htnode.inserting{ border-color:var(--blue); color:#cfe6ff; background:linear-gradient(180deg,#143055,#102844); }
+  .htnode.matched{ border-color:var(--cyan); color:#cdf6fa; box-shadow:0 0 8px rgba(86,212,221,.4); }
+  .htnode.warn{ border-color:var(--amber); color:#ffe2ab; background:#1d1808; }
+  .htnode.evict{ border-color:var(--red); color:#ffd4d0; background:linear-gradient(180deg,#3a1414,#2a1010); }
+  .story-sync{ display:flex; gap:14px; justify-content:center; margin:14px 0 6px; flex-wrap:wrap; }
+  .syncbadge{ font-size:12px; padding:6px 14px; border-radius:999px; border:1.5px solid var(--line); color:var(--muted); transition:all .3s; }
+  .syncbadge.g1.fire{ border-color:var(--amber); color:#ffe2ab; background:#1d1808; box-shadow:0 0 16px rgba(210,153,34,.45); transform:scale(1.04); }
+  .syncbadge.g2.fire{ border-color:var(--green); color:#c4f7d4; background:#0f2a18; box-shadow:0 0 16px rgba(63,185,80,.45); transform:scale(1.04); }
+  .consist-flag{ text-align:center; font-weight:700; font-size:14px; min-height:20px; margin-top:4px; }
+  .consist-flag.ok{ color:var(--green); } .consist-flag.bad{ color:var(--red); }
+
+  /* ---------- tab6: PrefetchAck alignment & anti-hang ---------- */
+  #scene6 .ackmesh{ display:flex; flex-direction:column; gap:8px; margin-bottom:6px; }
+  .ackrow{ display:flex; align-items:center; gap:10px; border:1px solid var(--line); border-radius:10px; padding:6px 10px; background:var(--panel2); transition:all .3s; }
+  .ackrow.blocked{ border-color:var(--red); box-shadow:0 0 12px rgba(248,81,73,.3); }
+  .acklabel{ width:190px; font-size:12px; color:var(--muted); }
+  .acklabel b{ color:var(--text); }
+  .ackslots{ flex:1; display:flex; gap:8px; }
+  .ackchip{ flex:1; height:34px; border-radius:8px; border:1px solid var(--line); background:#0d1117;
+    display:flex; align-items:center; justify-content:center; font-size:11px; color:#5b6470; gap:5px;
+    transition:all .3s; position:relative; }
+  .ackchip .err{ font-size:9px; color:var(--red); }
+  .ackchip.pending{ opacity:.4; }
+  .ackchip.emit{ border-color:var(--blue); color:#cfe6ff; background:linear-gradient(180deg,#143055,#102844); box-shadow:0 0 10px rgba(88,166,255,.4); }
+  .ackchip.passed{ border-color:var(--green); color:#c4f7d4; background:linear-gradient(180deg,#0f3a1d,#0e2c18); }
+  .ackchip.wait{ border-color:var(--amber); color:#ffe2ab; background:#1d1808; animation:pulse 1s infinite; }
+  .ackchip.missing{ border-color:var(--red); border-style:dashed; color:#ffd4d0; background:#2a1010; }
+  .barriers{ border:1px dashed var(--line); border-radius:10px; padding:8px 10px; margin-top:4px; }
+  .barlabel{ font-size:12px; color:var(--muted); margin-bottom:6px; text-align:center; }
+  .barrow{ display:flex; gap:8px; align-items:stretch; }
+  .barrow .barspacer{ width:190px; }
+  .barcols{ flex:1; display:flex; gap:8px; }
+  .bar{ flex:1; border:1.5px solid var(--line); border-radius:8px; padding:6px 4px; text-align:center;
+    font-size:11px; color:var(--muted); transition:all .3s; }
+  .bar .bcount{ display:block; font-size:13px; font-weight:800; margin-top:2px; color:#5b6470; }
+  .bar.waiting{ border-color:var(--amber); color:#ffe2ab; background:#1d1808; animation:pulse 1s infinite; }
+  .bar.waiting .bcount{ color:#ffe2ab; }
+  .bar.fired{ border-color:var(--green); color:#c4f7d4; background:#0f2a18; }
+  .bar.fired .bcount{ color:#c4f7d4; }
+  .bar.dead{ border-color:var(--red); color:#ffd4d0; background:#2a1010; box-shadow:0 0 12px rgba(248,81,73,.4); }
+  .bar.dead .bcount{ color:#ffd4d0; }
+</style>
+</head>
+<body>
+<button id="langBtn" onclick="toggleLang()">EN</button>
+<header>
+  <h1>HiCache × Pipeline Parallel：树一致性 & 防死锁</h1>
+  <p>拓扑 <b>PP=3 × TP=8 = 24 ranks</b> · 行=TP 组、列=PP 组 · MIN all-reduce 保证 radix tree 一致 · 2 套 gloo 组避免后台 collective 死锁</p>
+</header>
+
+<div class="tabs">
+  <button class="tab active" data-tab="story">① 两请求全流程（L3 命中/未命中 · host tree 一致）</button>
+  <button class="tab" data-tab="consistency">② 树一致性（自动播放）</button>
+  <button class="tab" data-tab="deadlock">③ 为什么 2 个组不死锁</button>
+  <button class="tab" data-tab="skew">④ 异步时间差 × MIN 统一步调</button>
+  <button class="tab" data-tab="threads">⑤ 线程关系 & 树一致性</button>
+  <button class="tab" data-tab="ackalign">⑥ PrefetchAck 对齐 & 防 hang</button>
+</div>
+
+<div class="wrap">
+  <!-- ============ TAB 5 (story, shown first) ============ -->
+  <div class="scene" id="scene5">
+    <div class="legend">
+      <span class="chip"><span class="sw" style="background:var(--blue)"></span>GPU 计算 / 插入中</span>
+      <span class="chip"><span class="sw" style="background:var(--cyan)"></span>match 命中前缀</span>
+      <span class="chip"><span class="sw" style="background:var(--amber)"></span>各 rank 不一致（待 MIN 统一）</span>
+      <span class="chip"><span class="sw" style="background:var(--green)"></span>已提交 / 一致</span>
+      <span class="chip"><span class="sw" style="background:var(--red)"></span>未命中 / 淘汰删除</span>
+    </div>
+    <div class="story-top">
+      <div class="gpu-badge" id="gpuBadge"><span class="ic">▣</span><span>GPU 计算</span></div>
+      <div class="l3box" id="l3box">
+        <div class="l3title"><span class="l3lab"><b>L3 持久化存储</b>（storage backend，3 个 rank 共享视图）</span><span class="badge" id="l3badge"></span></div>
+        <div class="pagerow" id="l3pages"></div>
+      </div>
+    </div>
+    <div class="ranks" id="ranks"></div>
+    <div class="story-sync">
+      <div class="syncbadge g1" id="s5sync1">◆ MIN 组1 · prefetch_hits_sync_groups · storage_hit_count</div>
+      <div class="syncbadge g2" id="s5sync2">◆ MIN 组2 · prefetch_completion_sync_groups · completed_tokens</div>
+    </div>
+    <div class="consist-flag" id="s5flag"></div>
+    <div class="caption" id="cap5">自动播放中…</div>
+  </div>
+  <div class="controls" id="ctl5">
+    <button class="ctl primary" id="play5">⏸ 暂停</button>
+    <button class="ctl" id="replay5">⟲ 重播</button>
+  </div>
+
+  <!-- ============ TAB 1 ============ -->
+  <div class="scene hidden" id="scene1">
+    <div class="legend">
+      <span class="chip"><span class="sw" style="background:var(--amber)"></span>命中数被截断（不一致）</span>
+      <span class="chip"><span class="sw" style="background:var(--blue)"></span>TP 组内 MIN 后</span>
+      <span class="chip"><span class="sw" style="background:var(--green)"></span>PP 组内 MIN 后（全局一致）</span>
+    </div>
+    <div id="mesh1"></div>
+    <div class="tree-box">
+      <div class="tree-title" id="treeTitle">所有 24 个 rank 共享同一棵 radix tree</div>
+      <div class="tree" id="sharedTree"></div>
+    </div>
+    <div class="caption" id="cap1">自动播放中…</div>
+  </div>
+  <div class="controls">
+    <button class="ctl primary" id="play1">⏸ 暂停</button>
+    <button class="ctl" id="replay1">⟲ 重播</button>
+  </div>
+
+  <!-- ============ TAB 2 ============ -->
+  <div class="scene hidden" id="scene2">
+    <div class="legend">
+      <span class="chip"><span class="sw" style="background:var(--purple)"></span><b>prefetch_thread</b>（独立后台线程）· reduce(storage_hit_count)</span>
+      <span class="chip"><span class="sw" style="background:var(--cyan)"></span><b>prefetch_sync_thread</b>（独立后台线程）· reduce(completed_tokens)</span>
+    </div>
+    <div class="note" style="margin-bottom:8px;">每个 cell = 1 个 rank，内含 2 个独立后台线程（小圆点 ●A ●B）。每一行是一个 <b>TP communicator</b>，每一列是一个 <b>PP communicator</b>。</div>
+    <div id="mesh2"></div>
+    <div class="groups">
+      <div class="grp g1" id="g1"><b>prefetch_hits_sync_groups</b><br>命中页数归约组（含 TP 环 + PP 环）<br><span style="font-size:11px">reduce(storage_hit_count)</span></div>
+      <div class="grp g2" id="g2"><b>prefetch_completion_sync_groups</b><br>完成 token 归约组（含 TP 环 + PP 环）<br><span style="font-size:11px">reduce(completed_tokens)</span></div>
+    </div>
+    <div class="banner" id="banner2"></div>
+    <div class="caption" id="cap2">选择场景：用 <b>1 套组</b> 会死锁，用 <b>2 套组</b> 则安全。</div>
+  </div>
+  <div class="controls" id="ctl2">
+    <button class="ctl alt" id="play1grp">▶ 1 套组（死锁）</button>
+    <button class="ctl primary" id="play2grp">▶ 2 套组（安全）</button>
+    <button class="ctl" id="reset2">重置</button>
+  </div>
+
+  <!-- ============ TAB 3 ============ -->
+  <div class="scene hidden" id="scene3">
+    <!-- layer 3: pipeline timing (top) -->
+    <div class="t3-section">
+      <div class="t3-title">③ 主 PP 流水线执行<strong>时序</strong> <span class="tag gpu">NCCL · GPU</span>
+        <span style="color:var(--muted);font-size:11px;">时序连续、错峰流动，<strong style="color:var(--green)">不被后台 prefetch 同步打断</strong></span>
+      </div>
+      <div class="pipe" id="pipe">
+        <div class="lane s0"><span class="lname">PP stage 0</span><span class="lane-hint" id="hint0"></span></div>
+        <div class="lane s1"><span class="lname">PP stage 1</span><span class="lane-hint" id="hint1"></span></div>
+        <div class="lane s2"><span class="lname">PP stage 2</span><span class="lane-hint" id="hint2"></span></div>
+      </div>
+      <div class="flow-note">↑ 流水线跑的正是②组好的 <strong>mb0→mb3</strong>，沿 stage0→1→2 错峰对角推进</div>
+    </div>
+
+    <div class="t3-conduit" id="arrow1">
+      <span class="clabel">▲ 组好的 <b>batch &amp; micro-batch 顺序</b> 喂给流水线（内容）</span>
+      <span class="spark"></span><span class="spark s2"></span>
+    </div>
+
+    <!-- layer 2: batch former (middle) -->
+    <div class="t3-section">
+      <div class="t3-title">② 三个 PP rank 用<strong>同一个 storage hit</strong> 组 batch（内容必须逐 rank 一致）</div>
+      <div class="formers" id="formers"></div>
+    </div>
+
+    <div class="t3-conduit" id="arrow2">
+      <span class="clabel">▲ <code>all_reduce(MIN)</code> 输出统一值 <b>6</b> → 决定 batch size</span>
+      <span class="spark"></span><span class="spark s2"></span>
+    </div>
+
+    <!-- layer 1: async prefetch + MIN (bottom = causal source) -->
+    <div class="t3-section">
+      <div class="t3-title">① 异步 prefetch 查询 → <code>all_reduce(MIN)</code> <span class="tag cpu">gloo · CPU 后台线程</span></div>
+      <div class="sync-area" id="syncArea">
+        <div class="clock" id="t3clock">t = 0.0s</div>
+        <div class="barrier" id="barrier" style="left:62%"><span class="blabel">all_reduce(MIN)</span></div>
+        <div class="slabel" id="sl0">rank pp0</div><div class="pkt travel" id="pkt0"><span>pp0 storage hit</span><span class="hv">8</span></div>
+        <div class="slabel" id="sl1">rank pp1</div><div class="pkt travel" id="pkt1"><span>pp1 storage hit</span><span class="hv">6</span></div>
+        <div class="slabel" id="sl2">rank pp2</div><div class="pkt travel" id="pkt2"><span>pp2 storage hit</span><span class="hv">7</span></div>
+      </div>
+    </div>
+    <div class="caption" id="cap3">自动播放中…</div>
+  </div>
+  <div class="controls" id="ctl3">
+    <button class="ctl primary" id="play3">⏸ 暂停</button>
+    <button class="ctl" id="replay3">⟲ 重播</button>
+  </div>
+
+  <!-- ============ TAB 4 ============ -->
+  <div class="scene hidden" id="scene4">
+    <div class="t4wrap">
+      <!-- left: thread data-flow -->
+      <div class="t4flow">
+        <div class="tbox sched" id="t4b0">
+          <div class="tname">调度器 Scheduler <span class="pin">主线程</span></div>
+          <div class="tdesc">发起 prefetch 请求（writeback / load）</div>
+        </div>
+        <div class="tarrow" id="t4a0">▼ <b>prefetch_queue</b>（PrefetchOperation）</div>
+
+        <div class="tbox thread-hit" id="t4b1">
+          <div class="tname">① prefetch_thread <span class="pin">storage-hit 线程</span></div>
+          <div class="tdesc"><code>_storage_hit_query()</code> 查询 L3 命中页数；命中足够→放 prefetch_buffer，不足→prefetch_revoke_queue</div>
+        </div>
+        <div class="minnode" id="t4m1">◆ all_reduce(MIN) storage_hit_count
+          <small>@ prefetch_hits_sync_groups（组1，gloo/CPU，含 TP 环 + PP 环）</small></div>
+        <div class="tarrow" id="t4a1">▼ <b>prefetch_buffer</b></div>
+
+        <div class="tbox thread-io" id="t4b2">
+          <div class="tname">② prefetch_io_aux_thread <span class="pin">IO 加载线程</span></div>
+          <div class="tdesc"><code>_page_transfer()</code> 逐 batch 把页从 L3 读入 host；累加 <b>completed_tokens</b>；<b>每个 storage batch 产生 1 个 PrefetchAck</b>（出错也照常产生）</div>
+        </div>
+        <div class="tarrow" id="t4a2">▼ <b>prefetch_sync_queue</b>（PrefetchAck）</div>
+
+        <div class="tbox thread-sync" id="t4b3">
+          <div class="tname">③ prefetch_sync_thread <span class="pin">completion-token 线程</span></div>
+          <div class="tdesc">对每个 ack 的 <b>completed_tokens</b> 做归约</div>
+        </div>
+        <div class="minnode g2" id="t4m2">◆ all_reduce(MIN) completed_tokens
+          <small>@ prefetch_completion_sync_groups（组2，gloo/CPU，含 TP 环 + PP 环）</small></div>
+        <div class="tarrow" id="t4a3">▼ <b>ack_prefetch_queue</b></div>
+
+        <div class="tbox sched" id="t4b4">
+          <div class="tname">调度器写入 host radix tree</div>
+          <div class="tdesc">只插入 <b>completed_tokens</b> 长度的前缀 → <code>_insert_helper_host()</code></div>
+        </div>
+      </div>
+
+      <!-- right: why consistent -->
+      <div class="t4why">
+        <div class="whycard a" id="t4wa">
+          <h4>为什么 MIN(storage_hit) 一致？</h4>
+          <p>各 rank 命中可能不同（host 内存截断、L3 视图差异）。MIN 取<b>最长公共可命中前缀</b> → 所有 rank <b>抓取范围一致</b>，不会各抓不同长度。</p>
+        </div>
+        <div class="whycard b" id="t4wb">
+          <h4>为什么 MIN(completed_tokens) 一致？</h4>
+          <p>即便抓取范围一致，实际逐页加载仍可能<b>部分失败</b>（<code>page_get</code> 返回 n≠batch）。MIN 只提交<b>所有 rank 都成功落盘的最长公共前缀</b> → 写入 host tree 的长度逐 rank 相同。</p>
+        </div>
+        <div class="whycard c" id="t4wc">
+          <h4>为什么不会 hang？</h4>
+          <p>每个 storage batch <b>都产生且仅产生一个 PrefetchAck</b>（即使出错也照常产生）→ 每个 rank 参与的 reduce <b>次数严格相等</b>，collective 一一对齐。两个 MIN 一起保证：<b>插入 host tree 的前缀逐 rank 完全相同 → 树一致</b>。</p>
+        </div>
+      </div>
+    </div>
+    <div class="caption" id="cap4">两个 MIN 同步点（组1 命中数、组2 完成数）+ 每 batch 恒定 1 个 ack，共同保证 PP 各 rank 的 host radix tree 严格一致。</div>
+  </div>
+  <div class="controls" id="ctl4">
+    <button class="ctl primary" id="play4">⏸ 暂停</button>
+    <button class="ctl" id="replay4">⟲ 重播</button>
+  </div>
+
+  <!-- ============ TAB 6 : PrefetchAck alignment & anti-hang ============ -->
+  <div class="scene hidden" id="scene6">
+    <div class="note" style="margin-bottom:10px;">每个 <b>storage batch</b> 在 <code>_page_transfer</code> 里恒产 <b>1 个 PrefetchAck</b>；<code>prefetch_sync_thread</code> 对<strong>每个 ack</strong> 在组2 做一次 <code>all_reduce(MIN)</code>。所以 <b>ack 数 = batch 数 = 组2 collective 次数</b>，必须逐 rank 相等。</div>
+    <div class="legend">
+      <span class="chip"><span class="sw" style="background:var(--blue)"></span>ack 已产出（参与本轮 reduce）</span>
+      <span class="chip"><span class="sw" style="background:var(--green)"></span>barrier 凑齐 3/3 → 通过</span>
+      <span class="chip"><span class="sw" style="background:var(--amber)"></span>已到达，等待缺席方</span>
+      <span class="chip"><span class="sw" style="background:var(--red)"></span>缺失 ack → 永远等不到</span>
+    </div>
+    <div class="ackmesh" id="ackmesh"></div>
+    <div class="barriers">
+      <div class="barlabel">◆ <code>all_reduce(MIN)</code> @ 组2（prefetch_completion_sync_groups）· 每个 ack 一次 barrier</div>
+      <div class="barrow"><div class="barspacer"></div><div class="barcols" id="barcols"></div></div>
+    </div>
+    <div class="banner" id="banner6"></div>
+    <div class="caption" id="cap6">选择场景：<b>每 batch 恒 1 ack</b> → 次数对齐、安全；<b>出错就 break</b> → ack 缺一个 → 组2 reduce 错位 → hang。</div>
+  </div>
+  <div class="controls" id="ctl6">
+    <button class="ctl primary" id="play6good">▶ 正确（每 batch 恒 1 ack）</button>
+    <button class="ctl alt" id="play6bad">▶ 错误（出错 break → ack 缺失）</button>
+    <button class="ctl" id="reset6">重置</button>
+  </div>
+</div>
+
+<script>
+const PP=3, TP=8;
+// initial hit counts per (pp,tp). some truncated by host-mem pressure.
+const HITS=[
+  [8,8,8,8,8,8,8,8],
+  [8,8,6,8,8,8,8,7],   // stage1: rank(1,2)=6, rank(1,7)=7 truncated
+  [8,8,8,8,8,8,8,8],
+];
+const ROWMIN = HITS.map(r=>Math.min(...r));      // [8,6,8]
+const GMIN = Math.min(...ROWMIN);                // 6
+const sleep=ms=>new Promise(r=>setTimeout(r,ms));
+
+/* ---------- build a mesh ---------- */
+function buildMesh(containerId, withThreads){
+  const c=document.getElementById(containerId);
+  let html='<div class="mesh-head"><div class="tp-hdr">';
+  for(let t=0;t<TP;t++) html+=`<div class="th">TP ${t}</div>`;
+  html+='</div></div>';
+  for(let p=0;p<PP;p++){
+    html+=`<div class="pp-row"><div class="pp-label"><b>PP stage ${p}</b><br>(TP 组)</div><div class="row-cells" id="${containerId}-row${p}">`;
+    for(let t=0;t<TP;t++){
+      html+=`<div class="cell" id="${containerId}-c${p}-${t}">
+        <div class="v">—</div>
+        <div class="rk">r${p*TP+t}</div>
+        ${withThreads?`<div class="tdots"><span class="td a" id="${containerId}-A-${p}-${t}"></span><span class="td b" id="${containerId}-B-${p}-${t}"></span></div>`:''}
+      </div>`;
+    }
+    html+='</div></div>';
+  }
+  // pp-group footer (columns)
+  html+='<div class="pp-foot"><div class="lab">PP 组(列)<br>每列跨 3 个 stage →</div><div class="cols">';
+  for(let t=0;t<TP;t++) html+=`<div class="col" id="${containerId}-col${t}">r${t}·r${t+TP}·r${t+2*TP}</div>`;
+  html+='</div></div>';
+  c.innerHTML=html;
+}
+const cell=(m,p,t)=>document.getElementById(`${m}-c${p}-${t}`);
+const val =(m,p,t)=>cell(m,p,t).querySelector('.v');
+
+buildMesh('mesh1',false);
+buildMesh('mesh2',true);
+
+/* ============================================================
+   TAB 1 : auto-play loop
+   query -> diverge -> TP all_reduce(MIN) -> PP all_reduce(MIN) -> consistent tree
+   ============================================================ */
+let t1Token=0, t1Paused=false;
+const cap1=document.getElementById('cap1');
+const sharedTree=document.getElementById('sharedTree');
+
+function resetMesh1(){
+  for(let p=0;p<PP;p++) for(let t=0;t<TP;t++){
+    const cl=cell('mesh1',p,t); cl.className='cell';
+    val('mesh1',p,t).innerHTML='—';
+  }
+  sharedTree.innerHTML='';
+}
+async function gate(my){ while(t1Paused){ await sleep(120); if(my!==t1Token) throw 0; } }
+async function step(ms,my){ await sleep(ms); await gate(my); if(my!==t1Token) throw 0; }
+
+async function runTab1(){
+  const my=++t1Token;
+  try{
+    while(true){
+      resetMesh1();
+      cap1.innerHTML='拓扑 <b>PP=3 × TP=8 = 24 个 rank</b>：每个 PP stage 下挂 8 个 TP rank。';
+      await step(1600,my);
+
+      // 1) independent query
+      for(let p=0;p<PP;p++) for(let t=0;t<TP;t++){
+        const v=HITS[p][t]; const cl=cell('mesh1',p,t);
+        val('mesh1',p,t).innerHTML=v+' <small>pg</small>';
+        if(v!==8) cl.classList.add('varied');
+        await sleep(35);
+      }
+      cap1.innerHTML='① 各 rank <span class="k">独立</span>向 L3 查询前缀命中。<b style="color:var(--amber)">注意 r10、r15 因 host 内存压力被截断</b>（6 / 7 页）。';
+      await step(2200,my);
+
+      // 2) diverge warning
+      cap1.innerHTML='② 若各 rank 按自己的命中数建 radix tree → 树高不一致 → 后续 PP 集合通信 <b style="color:var(--red)">shape mismatch → crash</b>。';
+      for(let p=0;p<PP;p++) for(let t=0;t<TP;t++) if(HITS[p][t]!==8){ cell('mesh1',p,t).classList.add('bad'); }
+      await step(2200,my);
+      for(let p=0;p<PP;p++) for(let t=0;t<TP;t++) cell('mesh1',p,t).classList.remove('bad','varied');
+
+      // 3) TP all_reduce(MIN) — sweep each row (all rows in parallel)
+      cap1.innerHTML='③ 第一步：在 <span class="k">TP 组（每一行 8 个 rank）</span>内 <code>all_reduce(MIN)</code>。';
+      for(let t=0;t<TP;t++){
+        for(let p=0;p<PP;p++) cell('mesh1',p,t).classList.add('sweep');
+        await step(110,my);
+        for(let p=0;p<PP;p++) cell('mesh1',p,t).classList.remove('sweep');
+      }
+      for(let p=0;p<PP;p++) for(let t=0;t<TP;t++){
+        const cl=cell('mesh1',p,t); cl.classList.add('tpmin');
+        val('mesh1',p,t).innerHTML=ROWMIN[p]+' <small>pg</small>';
+      }
+      cap1.innerHTML='③ TP 组归约后：<b>每一行变得一致</b>（PP0=8, PP1=6, PP2=8 = 各行最小值）。';
+      await step(1900,my);
+
+      // 4) PP all_reduce(MIN) — sweep each column (top->bottom)
+      cap1.innerHTML='④ 第二步：在 <span class="k">PP 组（每一列 3 个 rank）</span>内 <code>all_reduce(MIN)</code> → 收敛到全局最小值。';
+      for(let p=0;p<PP;p++){
+        for(let t=0;t<TP;t++) cell('mesh1',p,t).classList.add('sweep');
+        await step(180,my);
+        for(let t=0;t<TP;t++) cell('mesh1',p,t).classList.remove('sweep');
+      }
+      for(let p=0;p<PP;p++) for(let t=0;t<TP;t++){
+        const cl=cell('mesh1',p,t); cl.classList.remove('tpmin'); cl.classList.add('gmin');
+        val('mesh1',p,t).innerHTML=GMIN+' <small>pg</small>';
+      }
+      cap1.innerHTML='④ PP 组归约后：<b style="color:var(--green)">全部 24 个 rank 命中数 = 6</b>（最长公共前缀）。';
+      await step(1700,my);
+
+      // 5) shared consistent tree
+      cap1.innerHTML='⑤ 所有 rank 都只 prefetch / 建树到 6 → <span style="color:var(--green)">24 个 rank 的 radix tree 完全一致 ✓</span>';
+      sharedTree.innerHTML='';
+      for(let i=0;i<GMIN;i++){
+        const n=document.createElement('div'); n.className='tnode'; n.textContent='page '+i; sharedTree.appendChild(n);
+        await step(120,my); n.classList.add('show');
+      }
+      await step(2600,my);
+    }
+  }catch(e){ /* cancelled */ }
+}
+
+document.getElementById('play1').onclick=function(){
+  t1Paused=!t1Paused;
+  this.textContent=t1Paused?'▶ 播放':'⏸ 暂停';
+};
+document.getElementById('replay1').onclick=()=>{ t1Paused=false; document.getElementById('play1').textContent='⏸ 暂停'; runTab1(); };
+
+/* ============================================================
+   TAB 2 : why 2 group-sets avoid deadlock (PP×TP mesh)
+   ============================================================ */
+let t2Token=0;
+const cap2=document.getElementById('cap2');
+const banner2=document.getElementById('banner2');
+const g1=document.getElementById('g1'), g2=document.getElementById('g2');
+const row2=p=>document.getElementById('mesh2-row'+p);
+const col2=t=>document.getElementById('mesh2-col'+t);
+const dotEl=(op,p,t)=>document.getElementById(`mesh2-${op}-${p}-${t}`);
+
+function resetMesh2(){
+  ++t2Token;
+  for(let p=0;p<PP;p++){
+    row2(p).className='row-cells';
+    for(let t=0;t<TP;t++){
+      const cl=cell('mesh2',p,t); cl.className='cell';
+      val('mesh2',p,t).innerHTML=GMIN+' <small>pg</small>';
+      dotEl('A',p,t).className='td a'; dotEl('B',p,t).className='td b';
+    }
+  }
+  for(let t=0;t<TP;t++) col2(t).className='col';
+  g1.classList.remove('hot'); g2.classList.remove('hot'); g2.style.opacity=1;
+  banner2.className='banner'; banner2.textContent='';
+}
+async function s2(ms,my){ await sleep(ms); if(my!==t2Token) throw 0; }
+
+/* ---- 1 shared group set -> deadlock ---- */
+async function play1Group(){
+  resetMesh2(); const my=t2Token;
+  try{
+    g2.style.opacity=.25; g1.classList.add('hot');
+    cap2.innerHTML='只有 <b>1 套组</b>：prefetch_thread(A) 与 prefetch_sync_thread(B) 共用同一个 communicator 集。';
+    await s2(800,my);
+
+    // each rank's two threads race: some submit A first (purple), some B first (cyan)
+    cap2.innerHTML='两个后台线程<b>独立调度、顺序不定</b>：同一个 TP 环里，有的 rank 先发 A，有的先发 B。';
+    for(let p=0;p<PP;p++){
+      for(let t=0;t<TP;t++){
+        const aFirst=((p+t)%2===0);      // deterministic but mixed within each row
+        const first=aFirst?'A':'B';
+        dotEl(first,p,t).classList.add('on');
+      }
+    }
+    await s2(1200,my);
+
+    // rings cannot align -> red
+    cap2.innerHTML='同一个 communicator 上各 rank 提交的 collective <b style="color:var(--red)">不是同一个</b>（A 与 B 错位）→ rendezvous 永远配不上。';
+    for(let p=0;p<PP;p++){
+      row2(p).classList.add('ring-bad');
+      for(let t=0;t<TP;t++){
+        cell('mesh2',p,t).classList.add('bad');
+        const aFirst=((p+t)%2===0);
+        dotEl(aFirst?'A':'B',p,t).className=`td ${aFirst?'a':'b'} dead`;
+      }
+    }
+    for(let t=0;t<TP;t++) col2(t).classList.add('ring-bad');
+    await s2(700,my);
+    banner2.className='banner bad'; banner2.textContent='💥 DEADLOCK — 整个 24-rank job 卡死';
+    cap2.innerHTML='只要任一 communicator 上 A/B 交错，该环就死锁 → 全局 PP/TP 通信连环卡住。';
+  }catch(e){}
+}
+
+/* ---- 2 group sets -> safe ---- */
+async function play2Groups(){
+  resetMesh2(); const my=t2Token;
+  try{
+    g1.classList.add('hot'); g2.classList.add('hot');
+    cap2.innerHTML='用 <b>2 套独立组</b>：<b style="color:var(--purple)">A 永远走 prefetch_hits_sync_groups</b>，<b style="color:var(--cyan)">B 永远走 prefetch_completion_sync_groups</b>。';
+    await s2(800,my);
+
+    // wave A: every rank's prefetch_thread submits A to group-set-1; TP rings + PP rings light purple
+    cap2.innerHTML='第一波：所有 rank 的 <b>prefetch_thread</b> 只在 <code>prefetch_hits_sync_groups</code> 上提交 A → 序列一致。';
+    for(let p=0;p<PP;p++){
+      row2(p).classList.add('ring-a');
+      for(let t=0;t<TP;t++) dotEl('A',p,t).classList.add('on');
+    }
+    for(let t=0;t<TP;t++) col2(t).classList.add('ring-a');
+    await s2(1100,my);
+    for(let p=0;p<PP;p++){
+      row2(p).className='row-cells';
+      for(let t=0;t<TP;t++) dotEl('A',p,t).className='td a done';
+    }
+    for(let t=0;t<TP;t++) col2(t).className='col';
+    cap2.innerHTML='✓ TP 环 + PP 环上 A 全部到齐 → 第一波归约完成。';
+    await s2(900,my);
+
+    // wave B: prefetch_sync_thread submits B to group-set-2; rings light cyan
+    cap2.innerHTML='第二波：所有 rank 的 <b>prefetch_sync_thread</b> 只在 <code>prefetch_completion_sync_groups</code> 上提交 B → 序列一致。';
+    for(let p=0;p<PP;p++){
+      row2(p).classList.add('ring-b');
+      for(let t=0;t<TP;t++) dotEl('B',p,t).classList.add('on');
+    }
+    for(let t=0;t<TP;t++) col2(t).classList.add('ring-b');
+    await s2(1100,my);
+    for(let p=0;p<PP;p++){
+      row2(p).className='row-cells';
+      for(let t=0;t<TP;t++){ dotEl('B',p,t).className='td b done'; cell('mesh2',p,t).classList.add('gmin'); }
+    }
+    for(let t=0;t<TP;t++) col2(t).className='col';
+    banner2.className='banner ok'; banner2.textContent='✅ 安全 — 24 个 rank 全部对齐完成';
+    cap2.innerHTML='每个 communicator 上的 collective 序列在所有 rank <b style="color:var(--green)">完全一致</b>（A→组1、B→组2 不交叉）→ 不会死锁。';
+  }catch(e){}
+}
+
+document.getElementById('play1grp').onclick=play1Group;
+document.getElementById('play2grp').onclick=play2Groups;
+document.getElementById('reset2').onclick=()=>{ resetMesh2(); cap2.innerHTML='选择场景：用 <b>1 套组</b> 会死锁，用 <b>2 套组</b> 则安全。'; };
+
+/* ============================================================
+   TAB 3 : async PP skew  x  all_reduce(MIN) unifies pace
+   Top: continuous, skewed micro-batch pipeline (CSS infinite) — never pauses.
+   Bottom: time-driven (rAF). Each rank's prefetch op arrives at the MIN barrier
+   at a DIFFERENT wall-clock time (async skew). Early arrivals park & wait on the
+   gloo CPU group (background thread). When the slowest arrives, one MIN flash
+   unifies all three to 6 and they depart together — while the top pipeline keeps
+   flowing untouched.
+   ============================================================ */
+const NMB=4;                          // micro-batches per batch (illustrative)
+const MB_LABELS=Array.from({length:NMB},(_,k)=>'mb'+k);
+// build top pipeline micro-batches: one controllable block per (stage, mb)
+(function buildPipe(){
+  for(let s=0;s<3;s++){
+    const lane=document.querySelector('#pipe .s'+s);
+    for(let k=0;k<NMB;k++){
+      const mb=document.createElement('div');
+      mb.className='mb'; mb.id=`pmb-${s}-${k}`; mb.textContent=MB_LABELS[k];
+      lane.appendChild(mb);
+    }
+  }
+})();
+// build batch formers (one per PP rank): hit input + ordered mb chips
+(function buildFormers(){
+  const host=document.getElementById('formers');
+  for(let p=0;p<3;p++){
+    const f=document.createElement('div'); f.className='former'; f.id='former'+p;
+    let chips='';
+    for(let k=0;k<NMB;k++) chips+=`<div class="mbchip" id="fchip-${p}-${k}">${MB_LABELS[k]}</div>`;
+    f.innerHTML=`<h5>调度器 · PP rank ${p}</h5>
+      <div class="hitbox"><span class="ht1">已缓存前缀 storage hit = </span><b id="fhit${p}">？</b><span class="ht2"> 页 → 决定 batch 组成</span></div>
+      <div class="mbrow">${chips}</div>
+      <div class="chk" id="fchk${p}"></div>`;
+    host.appendChild(f);
+  }
+})();
+const arrow1=document.getElementById('arrow1');
+const arrow2=document.getElementById('arrow2');
+
+const PKT=[
+  { el:document.getElementById('pkt0'), lab:document.getElementById('sl0'), y:34,  hit:8, arrive:2.6 },
+  { el:document.getElementById('pkt1'), lab:document.getElementById('sl1'), y:85,  hit:6, arrive:1.9 }, // arrives first
+  { el:document.getElementById('pkt2'), lab:document.getElementById('sl2'), y:136, hit:7, arrive:3.9 }, // slowest -> everyone waits
+];
+const GMIN3=Math.min(...PKT.map(p=>p.hit)); // 6
+const T3={ START:0.4, SYNC:3.9, FLASH_END:4.5, DEPART_END:5.4,
+           DROP:4.5, MB_START:5.0, MB_STEP:0.35, READY:6.4, CYCLE:14.0,
+           PIPE_START:6.4, MB_LAG:1.0, STAGE_LAG:0.9, MB_TRAVEL:3.2,
+           X0:11, XB:62, X1:97 }; // seconds / percentages
+const cap3=document.getElementById('cap3');
+const barrier=document.getElementById('barrier');
+const t3clock=document.getElementById('t3clock');
+let t3raf=null, t3on=false, t3paused=false, t3start=0, t3lastT=0;
+
+function placePkt(p){ p.lab.style.top=p.y+'px'; p.el.style.top=p.y+'px'; }
+PKT.forEach(placePkt);
+function lerp(a,b,u){ return a+(b-a)*Math.max(0,Math.min(1,u)); }
+// returns progress 0..1 if t (or its wrapped form) is inside [st,en), else -1
+function pipeActive(st,en,t){
+  if(t>=st && t<en) return (t-st)/(en-st);
+  const tw=t+T3.CYCLE;
+  if(tw>=st && tw<en) return (tw-st)/(en-st);   // previous batch still draining across loop
+  return -1;
+}
+
+function resetFormers(){
+  arrow1.classList.remove('hot'); arrow2.classList.remove('hot');
+  for(let p=0;p<3;p++){
+    document.getElementById('fhit'+p).textContent='？';
+    document.getElementById('former'+p).classList.remove('ready');
+    document.getElementById('fchk'+p).textContent='';
+    for(let k=0;k<NMB;k++) document.getElementById(`fchip-${p}-${k}`).className='mbchip';
+  }
+}
+
+function t3frame(now){
+  if(!t3on){ return; }
+  if(!t3paused){
+    let t=((now - t3start)/1000) % T3.CYCLE;
+    t3lastT=t;
+    t3clock.textContent='t = '+t.toFixed(1)+'s';
+    barrier.classList.toggle('fire', (t>=T3.SYNC && t<T3.FLASH_END));
+
+    // --- layer 1: async packets toward MIN barrier ---
+    PKT.forEach(p=>{
+      let xPct, cls;
+      if(t < T3.START){ xPct=T3.X0; cls='travel'; }
+      else if(t < p.arrive){ xPct=lerp(T3.X0, T3.XB, (t-T3.START)/(p.arrive-T3.START)); cls='travel'; }
+      else if(t < T3.FLASH_END){ xPct=T3.XB; cls='wait'; }
+      else if(t < T3.DEPART_END){ xPct=lerp(T3.XB, T3.X1, (t-T3.FLASH_END)/(T3.DEPART_END-T3.FLASH_END)); cls='unified'; }
+      else { xPct=T3.X1; cls='unified'; }
+      p.el.style.left='calc('+xPct+'% - 54px)';
+      p.el.className='pkt '+cls;
+      p.el.querySelector('.hv').textContent = (t>=T3.SYNC? GMIN3 : p.hit);
+    });
+
+    // --- layer 2: batch formers driven by unified value ---
+    if(t < T3.FLASH_END){
+      resetFormers();
+    } else {
+      arrow2.classList.add('hot');                 // MIN -> value out
+      for(let p=0;p<3;p++) document.getElementById('fhit'+p).textContent=GMIN3;
+      // light mb chips in identical order across all three formers
+      let lit=0;
+      for(let k=0;k<NMB;k++){
+        if(t >= T3.MB_START + k*T3.MB_STEP){
+          for(let p=0;p<3;p++) document.getElementById(`fchip-${p}-${k}`).classList.add('on');
+          lit++;
+        }
+      }
+      if(t >= T3.READY){
+        arrow1.classList.add('hot');               // batch -> pipeline content
+        for(let p=0;p<3;p++){
+          document.getElementById('former'+p).classList.add('ready');
+          document.getElementById('fchk'+p).textContent='✓ batch & mb 顺序一致';
+          for(let k=0;k<NMB;k++) document.getElementById(`fchip-${p}-${k}`).className='mbchip fixed';
+        }
+      } else {
+        arrow1.classList.remove('hot');
+      }
+    }
+
+    // --- layer 3: the formed mb0..mb3 flow through stage0->1->2 (diagonal pipeline) ---
+    for(let s=0;s<3;s++){
+      for(let k=0;k<NMB;k++){
+        const b=document.getElementById(`pmb-${s}-${k}`);
+        const st=T3.PIPE_START + k*T3.MB_LAG + s*T3.STAGE_LAG;
+        const u=pipeActive(st, st+T3.MB_TRAVEL, t);   // handles cycle wrap (previous batch still draining)
+        if(u>=0){ b.style.left=(19+u*73)+'%'; b.style.opacity=1; }
+        else b.style.opacity=0;
+      }
+      document.getElementById('hint'+s).textContent = (t>1.4 && t<T3.PIPE_START) ? '（等待②组好的 batch…）' : '';
+    }
+
+    // --- captions ---
+    if(t < T3.START) cap3.innerHTML='① 三个 PP rank 的 prefetch 查询<strong>异步发起</strong>（到达时刻不同）。';
+    else if(t < PKT[2].arrive) cap3.innerHTML='① 先到的 rank 在 <span class="k">gloo CPU 后台线程</span>上<b style="color:var(--amber)">等待对齐</b>（不占 GPU）。';
+    else if(t < T3.FLASH_END) cap3.innerHTML='① <b style="color:var(--amber)">pp2 最慢</b>到达 → <code>all_reduce(MIN)</code> 把 8/6/7 <strong style="color:var(--green)">统一成 6</strong>。';
+    else if(t < T3.READY) cap3.innerHTML='② 统一后的 <b>storage hit = 6</b> 下发给各 rank 调度器 → 决定<strong>已缓存前缀长度 / batch size / micro-batch 顺序</strong>（mb0→mb3）。';
+    else cap3.innerHTML='③ 三个 rank 因拿到<strong style="color:var(--green)">同一个 6</strong> 而组出<strong style="color:var(--green)">完全一致的 batch 与 mb 顺序</strong>，喂给 PP 流水线；执行时序连续不被打断。<br><span style="color:var(--red)">⚠ 若 storage hit 不统一 → batch/mb 顺序逐 rank 发散 → PP 调度错位、卡死。</span>';
+  }
+  t3raf=requestAnimationFrame(t3frame);
+}
+function startTab3(restart){
+  t3on=true;
+  if(restart || !t3start){ t3start=performance.now(); t3paused=false; document.getElementById('play3').textContent='⏸ 暂停'; }
+  document.getElementById('pipe').classList.remove('paused');
+  if(!t3raf) t3raf=requestAnimationFrame(t3frame);
+}
+function stopTab3(){ t3on=false; if(t3raf){ cancelAnimationFrame(t3raf); t3raf=null; } }
+
+document.getElementById('play3').onclick=function(){
+  t3paused=!t3paused;
+  if(t3paused){ t3start=performance.now() - t3lastT*1000; } // freeze
+  else { t3start=performance.now() - t3lastT*1000; }        // resume from frozen t
+  this.textContent=t3paused?'▶ 播放':'⏸ 暂停';
+  document.getElementById('pipe').classList.toggle('paused', t3paused);
+};
+document.getElementById('replay3').onclick=()=>startTab3(true);
+
+/* ============================================================
+   TAB 4 : animated walk-through of the prefetch thread pipeline.
+   Highlights each stage in sequence; the chain "lights up" as the
+   data (PrefetchOperation → Ack → completed_tokens) flows down, and
+   the matching right-side why-card glows at each MIN sync point.
+   ============================================================ */
+let t4Token=0, t4Paused=false;
+const cap4El=document.getElementById('cap4');
+// [ids-to-light, caption]
+const T4SEQ=[
+  [['t4b0'], '调度器主线程把 prefetch 请求（writeback / load）放入队列，触发后台流水线。'],
+  [['t4a0'], '<b>PrefetchOperation</b> 入队 <code>prefetch_queue</code>，交给后台线程处理。'],
+  [['t4b1'], '① <b>prefetch_thread</b> 调 <code>_storage_hit_query()</code> 查询 L3 命中页数（各 rank 可能不同）。'],
+  [['t4m1','t4wa'], '◆ <b style="color:var(--amber)">第一个 MIN</b>：在 <code>prefetch_hits_sync_groups</code>（组1）对 storage_hit_count 取最小 → <b>抓取范围逐 rank 一致</b>。'],
+  [['t4a1'], '命中足够的请求落入 <code>prefetch_buffer</code>，进入实际 IO 加载。'],
+  [['t4b2'], '② <b>prefetch_io_aux_thread</b> 用 <code>_page_transfer()</code> 逐 batch 把页 L3→host；<b>每个 batch 恒产生 1 个 PrefetchAck</b>（出错也产生）。'],
+  [['t4a2'], '每个 batch 的 <b>PrefetchAck</b> 入队 <code>prefetch_sync_queue</code>。'],
+  [['t4b3'], '③ <b>prefetch_sync_thread</b> 对每个 ack 的 <b>completed_tokens</b> 做归约。'],
+  [['t4m2','t4wb'], '◆ <b style="color:var(--green)">第二个 MIN</b>：在 <code>prefetch_completion_sync_groups</code>（组2）对 completed_tokens 取最小 → <b>真正落盘前缀逐 rank 一致</b>。'],
+  [['t4a3'], '统一后的结果入队 <code>ack_prefetch_queue</code> 回到调度器。'],
+  [['t4b4','t4wc'], '调度器只插入 <b>completed_tokens</b> 长度的前缀 → <code>_insert_helper_host()</code>。每 batch 恒 1 个 ack，<b>reduce 次数严格相等 → 不会 hang</b>。'],
+];
+function t4clear(){
+  document.querySelectorAll('#scene4 .lit').forEach(e=>e.classList.remove('lit'));
+  document.querySelectorAll('#scene4 .dimmed').forEach(e=>e.classList.remove('dimmed'));
+}
+async function t4gate(my){ while(t4Paused){ await sleep(120); if(my!==t4Token) throw 0; } }
+async function t4step(ms,my){ await sleep(ms); await t4gate(my); if(my!==t4Token) throw 0; }
+async function runTab4(){
+  const my=++t4Token;
+  try{
+    while(true){
+      t4clear();
+      cap4El.innerHTML='沿数据流向下逐步点亮：两个 MIN 同步点 + 每 batch 恒定 1 个 ack。';
+      await t4step(1200,my);
+      for(const [ids,cap] of T4SEQ){
+        await t4gate(my);
+        ids.forEach(id=>document.getElementById(id).classList.add('lit'));
+        cap4El.innerHTML=cap;
+        await t4step(1900,my);
+      }
+      cap4El.innerHTML='✅ 闭环：两个 MIN（组1 命中数 + 组2 完成数）+ 每 batch 1 个 ack → <b style="color:var(--green)">PP 各 rank 的 host radix tree 严格一致</b>。';
+      await t4step(2600,my);
+    }
+  }catch(e){ /* cancelled */ }
+}
+function stopTab4(){ ++t4Token; }
+
+document.getElementById('play4').onclick=function(){
+  t4Paused=!t4Paused;
+  this.textContent=t4Paused?'▶ 播放':'⏸ 暂停';
+};
+document.getElementById('replay4').onclick=()=>{ t4Paused=false; document.getElementById('play4').textContent='⏸ 暂停'; runTab4(); };
+
+/* ============================================================
+   TAB 5 : two-request full lifecycle (PP=3 × TP=4).
+   Req A misses (GPU compute → L2 insert → L3 backup), then L2 is
+   evicted (delete, identical across ranks); Req B hits L3 and goes
+   through the two MIN syncs so every PP rank inserts the SAME prefix
+   into its host radix tree → trees stay consistent, no deadlock.
+   ============================================================ */
+const NPG=4;                       // pages tracked in the story
+const RANK_NAMES=['PP rank 0','PP rank 1','PP rank 2'];
+(function buildStory(){
+  let h='';
+  for(let p=0;p<3;p++){
+    let tps=''; for(let t=0;t<4;t++) tps+=`<span class="tp" id="s5tp-${p}-${t}"></span>`;
+    let nodes='<span class="root">host root</span>';
+    for(let i=0;i<NPG;i++) nodes+=`<div class="htnode" id="s5n-${p}-${i}">p${i}</div>`;
+    h+=`<div class="ranklane" id="s5lane${p}">
+      <div class="rankhdr"><span class="rname">${RANK_NAMES[p]}</span><span class="tps">${tps}</span>
+        <span class="rstat" id="s5stat${p}">idle</span></div>
+      <div class="htree">${nodes}</div></div>`;
+  }
+  document.getElementById('ranks').innerHTML=h;
+  let l3=''; for(let i=0;i<NPG;i++) l3+=`<div class="pg" id="s5l3-${i}">p${i}</div>`;
+  document.getElementById('l3pages').innerHTML=l3;
+})();
+
+let t5Token=0, t5Paused=false;
+const cap5=document.getElementById('cap5');
+const s5n=(p,i)=>document.getElementById(`s5n-${p}-${i}`);
+const setNode=(p,i,cls)=>{ s5n(p,i).className='htnode show '+cls; };
+const hideNode=(p,i)=>{ s5n(p,i).className='htnode'; };
+function rstat(p,txt,cls){ const e=document.getElementById('s5stat'+p); e.className='rstat '+(cls||''); e.textContent=txt; }
+function s5flag(txt,cls){ const e=document.getElementById('s5flag'); e.className='consist-flag '+(cls||''); e.innerHTML=txt; }
+function s5reset(){
+  for(let p=0;p<3;p++){
+    document.getElementById('s5lane'+p).className='ranklane';
+    rstat(p,'idle','');
+    for(let t=0;t<4;t++) document.getElementById(`s5tp-${p}-${t}`).className='tp';
+    for(let i=0;i<NPG;i++) hideNode(p,i);
+  }
+  for(let i=0;i<NPG;i++) document.getElementById('s5l3-'+i).className='pg';
+  document.getElementById('gpuBadge').className='gpu-badge';
+  document.getElementById('l3box').className='l3box';
+  document.getElementById('l3badge').className='badge'; document.getElementById('l3badge').textContent='';
+  document.getElementById('s5sync1').className='syncbadge g1';
+  document.getElementById('s5sync2').className='syncbadge g2';
+  s5flag('','');
+}
+async function t5gate(my){ while(t5Paused){ await sleep(120); if(my!==t5Token) throw 0; } }
+async function t5step(ms,my){ await sleep(ms); await t5gate(my); if(my!==t5Token) throw 0; }
+const allRanks=fn=>{ for(let p=0;p<3;p++) fn(p); };
+
+async function runTab5(){
+  const my=++t5Token;
+  try{
+    while(true){
+      s5reset();
+      cap5.innerHTML='场景 <b>PP=3 × TP=4</b>：每个 PP rank 维护一棵 <b>L2 host radix tree</b>，共享底层 <b>L3 持久化存储</b>。跟踪两个请求，看 host tree 如何保持一致。';
+      await t5step(2000,my);
+
+      /* ===== ACT 1 : Request A — miss → GPU → L2 insert → L3 backup ===== */
+      cap5.innerHTML='① <b>请求 A</b> 到达（需要 4 个 page 的前缀），3 个 PP rank 同时处理。';
+      allRanks(p=>{ document.getElementById('s5lane'+p).classList.add('active'); for(let t=0;t<4;t++) document.getElementById(`s5tp-${p}-${t}`).className='tp on'; rstat(p,'req A',''); });
+      await t5step(1700,my);
+
+      cap5.innerHTML='① 查 L2 host tree → <b style="color:var(--red)">miss</b>；查 L3 → <b style="color:var(--red)">miss</b>（存储为空）。';
+      allRanks(p=>rstat(p,'L2/L3 miss','miss'));
+      document.getElementById('l3badge').className='badge miss'; document.getElementById('l3badge').textContent='miss';
+      await t5step(1900,my);
+
+      cap5.innerHTML='① 回退到 <b>GPU 前向计算</b>，生成这 4 个 page 的 KV。';
+      document.getElementById('gpuBadge').classList.add('busy');
+      allRanks(p=>rstat(p,'compute','warn'));
+      await t5step(1800,my);
+      document.getElementById('gpuBadge').classList.remove('busy');
+
+      cap5.innerHTML='① 计算结果写入 <b>L2 host radix tree</b> → 3 个 rank <code>insert</code> <strong style="color:var(--green)">相同</strong>的前缀 p0–p3。';
+      for(let i=0;i<NPG;i++){ allRanks(p=>setNode(p,i,'inserting')); await t5step(240,my); }
+      allRanks(p=>{ for(let i=0;i<NPG;i++) setNode(p,i,'committed'); rstat(p,'L2: 4','hit'); });
+      s5flag('✓ 3 棵 host tree 同步插入 4 个 page（一致）','ok');
+      await t5step(1600,my);
+
+      cap5.innerHTML='① backup 线程把 L2 → <b>L3</b> 持久化（<code>write_backup</code> / <code>page_set</code>）。';
+      document.getElementById('l3box').classList.add('hot');
+      document.getElementById('l3badge').className='badge hit'; document.getElementById('l3badge').textContent='stored';
+      for(let i=0;i<NPG;i++){ document.getElementById('s5l3-'+i).className='pg show l3'; await t5step(220,my); }
+      s5flag('','');
+      await t5step(1500,my);
+
+      /* ===== ACT 1.5 : L2 eviction (delete consistency) ===== */
+      cap5.innerHTML='② host 内存压力 → L2 触发<strong style="color:var(--red)">淘汰</strong>（<code>evict_host</code>）。3 棵 host tree <b>完全一致</b> → 淘汰命中<strong>同一批节点</strong>；L3 仍保留。';
+      for(let i=NPG-1;i>=0;i--){ allRanks(p=>setNode(p,i,'evict')); await t5step(330,my); allRanks(p=>hideNode(p,i)); }
+      allRanks(p=>rstat(p,'L2 empty',''));
+      s5flag('✓ 3 棵 host tree 同步删除（delete 一致）','ok');
+      await t5step(2000,my);
+      s5flag('','');
+
+      /* ===== ACT 2 : Request B — L3 hit → 2 MIN syncs → consistent insert ===== */
+      cap5.innerHTML='③ <b>请求 B</b> 到达（复用 A 的前缀）。L2 host tree 已空 → <b style="color:var(--red)">L2 miss</b>，转向 L3。';
+      allRanks(p=>{ document.getElementById('s5lane'+p).classList.add('active'); rstat(p,'req B','warn'); });
+      await t5step(1700,my);
+
+      cap5.innerHTML='③ <b>prefetch_thread</b> 各 rank 向 L3 查命中页数 → 结果可能<strong style="color:var(--amber)">不同</strong>（host 视图/内存差异）：4 / 3 / 4。';
+      const hitq=[4,3,4];
+      allRanks(p=>{ rstat(p,'L3 hit '+hitq[p], hitq[p]===4?'hit':'warn'); for(let i=0;i<hitq[p];i++) setNode(p,i,'warn'); });
+      s5flag('⚠ 查询长度不一致（4/3/4）→ 若各自建树，host tree 会发散','bad');
+      await t5step(2600,my);
+
+      cap5.innerHTML='◆ <b>第一个 MIN</b> @ <code>prefetch_hits_sync_groups</code>（组1，gloo/CPU，含 TP环+PP环）：<code>all_reduce(MIN)</code> 统一查询长度 = <b>3</b>。';
+      document.getElementById('s5sync1').classList.add('fire');
+      await t5step(1500,my);
+      allRanks(p=>{ for(let i=0;i<NPG;i++) hideNode(p,i); for(let i=0;i<3;i++) setNode(p,i,'matched'); rstat(p,'match 3','hit'); });
+      s5flag('✓ 抓取范围统一 = 3 → match_prefix 逐 rank 一致','ok');
+      await t5step(2200,my);
+
+      cap5.innerHTML='③ <b>prefetch_io_aux_thread</b> 逐 batch 把 page 从 L3 拉回 L2（<code>_page_transfer</code>），每 batch 产 1 个 PrefetchAck。';
+      document.getElementById('s5sync1').classList.remove('fire');
+      for(let i=0;i<3;i++){ allRanks(p=>setNode(p,i,'inserting')); await t5step(300,my); }
+      await t5step(700,my);
+
+      cap5.innerHTML='③ 逐页加载<strong style="color:var(--amber)">部分失败</strong>：rank2 第 3 页 <code>page_get</code> 未成功 → completed_tokens = 3 / 3 / 2。';
+      const done=[3,3,2];
+      allRanks(p=>{ for(let i=0;i<3;i++){ if(i<done[p]) setNode(p,i,'committed'); else setNode(p,i,'warn'); } rstat(p,'done '+done[p], done[p]===3?'hit':'warn'); });
+      s5flag('⚠ 实际落盘不一致（3/3/2）→ 若各自插入，host tree 会发散','bad');
+      await t5step(2600,my);
+
+      cap5.innerHTML='◆ <b>第二个 MIN</b> @ <code>prefetch_completion_sync_groups</code>（组2，<b>独立 communicator</b>）：<code>all_reduce(MIN)</code> 统一 completed_tokens = <b>2</b>。';
+      document.getElementById('s5sync2').classList.add('fire');
+      await t5step(1500,my);
+
+      cap5.innerHTML='③ 各 rank 只把统一的 <b>2 个 page</b> 插入 L2 host tree（<code>_insert_helper_host</code>）→ 3 棵 host tree <strong style="color:var(--green)">再次完全一致</strong>。';
+      allRanks(p=>{ for(let i=0;i<NPG;i++) hideNode(p,i); for(let i=0;i<2;i++) setNode(p,i,'committed'); rstat(p,'L2: 2','hit'); });
+      s5flag('✓ 插入长度统一 = 2 → 3 棵 host tree 完全一致','ok');
+      await t5step(2400,my);
+
+      cap5.innerHTML='✅ 两套<strong>独立 gloo 组</strong>（组1 命中数、组2 完成数）+ 每 batch 恒 1 个 ack → 各 rank 对 host tree 的<strong>插入/删除完全一致</strong> → <b style="color:var(--green)">host radix tree 始终一致，后台 collective 不会死锁</b>。';
+      document.getElementById('s5sync1').classList.add('fire');
+      await t5step(3400,my);
+      document.getElementById('s5sync1').classList.remove('fire');
+      document.getElementById('s5sync2').classList.remove('fire');
+      await t5step(700,my);
+    }
+  }catch(e){ /* cancelled */ }
+}
+function stopTab5(){ ++t5Token; }
+
+document.getElementById('play5').onclick=function(){
+  t5Paused=!t5Paused;
+  this.textContent=t5Paused?'▶ 播放':'⏸ 暂停';
+};
+document.getElementById('replay5').onclick=()=>{ t5Paused=false; document.getElementById('play5').textContent='⏸ 暂停'; runTab5(); };
+
+/* ============================================================
+   TAB 6 : PrefetchAck count alignment & anti-hang.
+   Each storage batch in _page_transfer emits exactly one PrefetchAck,
+   and prefetch_sync_thread does one all_reduce(MIN) on set-2 per ack.
+   So #acks == #batches == #set-2 collectives, and it must be equal on
+   every rank. We compare: (good) each batch always emits 1 ack even on
+   error → counts aligned → safe; (bad) break-on-error drops an ack →
+   one rank does fewer reduces → the others block forever → hang.
+   ============================================================ */
+const T6NB=3;                 // number of storage batches / acks
+(function buildAck(){
+  let m='';
+  for(let p=0;p<3;p++){
+    let slots='';
+    for(let k=0;k<T6NB;k++) slots+=`<div class="ackchip pending" id="ack-${p}-${k}">ack${k}</div>`;
+    m+=`<div class="ackrow" id="ackrow${p}"><div class="acklabel"><b>PP rank ${p}</b> · _page_transfer</div><div class="ackslots">${slots}</div></div>`;
+  }
+  document.getElementById('ackmesh').innerHTML=m;
+  let b='';
+  for(let k=0;k<T6NB;k++) b+=`<div class="bar" id="bar${k}">barrier ${k}<span class="bcount" id="bcnt${k}">0/3</span></div>`;
+  document.getElementById('barcols').innerHTML=b;
+})();
+
+let t6Token=0;
+const cap6=document.getElementById('cap6');
+const banner6=document.getElementById('banner6');
+const ackEl=(p,k)=>document.getElementById(`ack-${p}-${k}`);
+const barEl=k=>document.getElementById('bar'+k);
+const bcnt=k=>document.getElementById('bcnt'+k);
+function t6reset(){
+  ++t6Token;
+  for(let p=0;p<3;p++){
+    document.getElementById('ackrow'+p).className='ackrow';
+    for(let k=0;k<T6NB;k++){ ackEl(p,k).className='ackchip pending'; ackEl(p,k).innerHTML=`ack${k}`; }
+  }
+  for(let k=0;k<T6NB;k++){ barEl(k).className='bar'; bcnt(k).textContent='0/3'; }
+  banner6.className='banner'; banner6.textContent='';
+}
+async function s6(ms,my){ await sleep(ms); if(my!==t6Token) throw 0; }
+
+// nacks: how many acks each rank produces (rank index -> count)
+async function playAck(nacks, label){
+  t6reset(); const my=t6Token;
+  try{
+    cap6.innerHTML=label;
+    await s6(700,my);
+    for(let k=0;k<T6NB;k++){
+      barEl(k).classList.add('waiting');
+      let arrived=0;
+      // ranks emit ack k one by one (async arrival)
+      for(let p=0;p<3;p++){
+        await s6(520,my);
+        if(k<nacks[p]){
+          ackEl(p,k).className='ackchip emit';
+          arrived++; bcnt(k).textContent=arrived+'/3';
+        }
+      }
+      await s6(400,my);
+      if(arrived===3){
+        barEl(k).className='bar fired'; bcnt(k).textContent='3/3 ✓';
+        for(let p=0;p<3;p++) ackEl(p,k).className='ackchip passed';
+        cap6.innerHTML=(window.TR||((z,e)=>z))(
+          `barrier ${k}：3 个 rank 的 ack 都到齐 → <code>all_reduce(MIN)</code> 返回 → 本轮通过。`,
+          `barrier ${k}: all 3 ranks' acks arrived → <code>all_reduce(MIN)</code> returns → this round passes.`);
+        await s6(700,my);
+      }else{
+        // a rank is missing this ack -> collective can never complete
+        barEl(k).className='bar dead'; bcnt(k).textContent=arrived+'/3 ✗';
+        for(let p=0;p<3;p++){
+          if(k<nacks[p]){ ackEl(p,k).className='ackchip wait'; document.getElementById('ackrow'+p).classList.add('blocked'); }
+          else { ackEl(p,k).className='ackchip missing'; ackEl(p,k).innerHTML=`ack${k}<span class="err">${(window.TR||((z,e)=>z))('缺失','missing')}</span>`; }
+        }
+        cap6.innerHTML=(window.TR||((z,e)=>z))(
+          `barrier ${k}：只有 <b>${arrived}/3</b> 个 rank 进了 <code>all_reduce</code>（有 rank 早早 break、少产一个 ack）→ 已到达的 rank <b style="color:var(--amber)">永远阻塞</b>在这次 collective 上。`,
+          `barrier ${k}: only <b>${arrived}/3</b> ranks entered <code>all_reduce</code> (a rank broke early and emitted one fewer ack) → the ranks that arrived are <b style="color:var(--amber)">blocked forever</b> on this collective.`);
+        banner6.className='banner bad'; banner6.textContent='💥 HANG：组2 reduce 次数不一致（3/3/2）→ collective 永不返回';
+        return;
+      }
+    }
+    banner6.className='banner ok'; banner6.textContent='✅ 安全：每 rank 都做了 '+T6NB+' 次 reduce，次数严格相等，全部对齐完成';
+    cap6.innerHTML='每个 batch 恒产 1 个 ack（出错也产）→ <b>ack 数逐 rank 相等</b> → 组2 的 collective 一一对应 → 不会 hang。';
+  }catch(e){ /* cancelled */ }
+}
+function stopTab6(){ ++t6Token; }
+document.getElementById('play6good').onclick=()=>playAck([3,3,3],
+  '<b style="color:var(--green)">正确</b>：即便某 batch 出错，<code>_page_transfer</code> 也<strong>继续循环、照常产 ack</strong> → 三个 rank 都产 3 个 ack。');
+document.getElementById('play6bad').onclick=()=>playAck([3,3,2],
+  '<b style="color:var(--red)">错误（反面教材）</b>：rank2 在 batch2 出错就 <code>break</code> → 只产 2 个 ack，比别人少一个。');
+document.getElementById('reset6').onclick=()=>{ t6reset(); cap6.innerHTML='选择场景：<b>每 batch 恒 1 ack</b> → 次数对齐、安全；<b>出错就 break</b> → ack 缺一个 → 组2 reduce 错位 → hang。'; };
+
+/* ---------- tab switching ---------- */
+const ctl1=document.querySelectorAll('.controls')[1]; // [0] is now ctl5 (story)
+const ctl2=document.getElementById('ctl2');
+const ctl3=document.getElementById('ctl3');
+const ctl4=document.getElementById('ctl4');
+const ctl5=document.getElementById('ctl5');
+const ctl6=document.getElementById('ctl6');
+document.querySelectorAll('.tab').forEach(tab=>{
+  tab.onclick=()=>{
+    document.querySelectorAll('.tab').forEach(x=>x.classList.remove('active'));
+    tab.classList.add('active');
+    const w=tab.dataset.tab;
+    document.getElementById('scene5').classList.toggle('hidden', w!=='story');
+    document.getElementById('scene1').classList.toggle('hidden', w!=='consistency');
+    document.getElementById('scene2').classList.toggle('hidden', w!=='deadlock');
+    document.getElementById('scene3').classList.toggle('hidden', w!=='skew');
+    document.getElementById('scene4').classList.toggle('hidden', w!=='threads');
+    document.getElementById('scene6').classList.toggle('hidden', w!=='ackalign');
+    ctl5.style.display = w==='story'?'flex':'none';
+    ctl1.style.display = w==='consistency'?'flex':'none';
+    ctl2.style.display = w==='deadlock'?'flex':'none';
+    ctl3.style.display = w==='skew'?'flex':'none';
+    ctl4.style.display = w==='threads'?'flex':'none';
+    ctl6.style.display = w==='ackalign'?'flex':'none';
+    if(w==='story'){ ++t1Token; stopTab3(); stopTab4(); stopTab6(); t5Paused=false; document.getElementById('play5').textContent='⏸ 暂停'; runTab5(); }
+    else if(w==='consistency'){ t1Paused=false; document.getElementById('play1').textContent='⏸ 暂停'; runTab1(); stopTab3(); stopTab4(); stopTab5(); stopTab6(); }
+    else if(w==='skew'){ ++t1Token; startTab3(true); stopTab4(); stopTab5(); stopTab6(); }
+    else if(w==='threads'){ ++t1Token; stopTab3(); stopTab5(); stopTab6(); t4Paused=false; document.getElementById('play4').textContent='⏸ 暂停'; runTab4(); }
+    else if(w==='ackalign'){ ++t1Token; stopTab3(); stopTab4(); stopTab5(); t6reset(); }
+    else{ ++t1Token; stopTab3(); stopTab4(); stopTab5(); stopTab6(); }
+  };
+});
+ctl1.style.display='none';
+ctl2.style.display='none';
+ctl3.style.display='none';
+ctl4.style.display='none';
+ctl6.style.display='none';
+resetMesh2();
+runTab5();
+</script>
+
+<script>
+/* ============================================================
+   i18n: translate by text-content (robust to innerHTML normalization).
+   PAIRS = [zhHTML, enHTML]; keys derived from stripped textContent.
+   A MutationObserver re-translates dynamic captions on the fly.
+   ============================================================ */
+(function(){
+  const PAIRS = [
+    // header
+    ['HiCache × Pipeline Parallel：树一致性 & 防死锁','HiCache × Pipeline Parallel: Tree Consistency & Deadlock Avoidance'],
+    ['拓扑 <b>PP=3 × TP=8 = 24 ranks</b> · 行=TP 组、列=PP 组 · MIN all-reduce 保证 radix tree 一致 · 2 套 gloo 组避免后台 collective 死锁',
+     'Topology <b>PP=3 × TP=8 = 24 ranks</b> · rows = TP groups, cols = PP groups · MIN all-reduce keeps radix trees identical · 2 gloo group-sets avoid background-collective deadlock'],
+    // tabs
+    ['① 两请求全流程（L3 命中/未命中 · host tree 一致）','① Two-Request Lifecycle (L3 miss/hit · host-tree consistency)'],
+    ['② 树一致性（自动播放）','② Tree Consistency (auto-play)'],
+    ['③ 为什么 2 个组不死锁','③ Why 2 Groups Avoid Deadlock'],
+    ['④ 异步时间差 × MIN 统一步调','④ Async Skew × MIN Lockstep'],
+    ['⑤ 线程关系 & 树一致性','⑤ Thread Relationships & Consistency'],
+    ['⑥ PrefetchAck 对齐 & 防 hang','⑥ PrefetchAck Alignment & Anti-Hang'],
+    // tab6 note / legend / barrier label / scenario captions / banners
+    ['每个 <b>storage batch</b> 在 <code>_page_transfer</code> 里恒产 <b>1 个 PrefetchAck</b>；<code>prefetch_sync_thread</code> 对<strong>每个 ack</strong> 在组2 做一次 <code>all_reduce(MIN)</code>。所以 <b>ack 数 = batch 数 = 组2 collective 次数</b>，必须逐 rank 相等。',
+     'Each <b>storage batch</b> in <code>_page_transfer</code> always emits <b>exactly 1 PrefetchAck</b>; <code>prefetch_sync_thread</code> does one <code>all_reduce(MIN)</code> on set 2 <strong>per ack</strong>. So <b>#acks = #batches = #set-2 collectives</b>, and it must be equal on every rank.'],
+    ['<span class="sw" style="background:var(--blue)"></span>ack 已产出（参与本轮 reduce）','<span class="sw" style="background:var(--blue)"></span>ack emitted (joins this reduce)'],
+    ['<span class="sw" style="background:var(--green)"></span>barrier 凑齐 3/3 → 通过','<span class="sw" style="background:var(--green)"></span>barrier reaches 3/3 → pass'],
+    ['<span class="sw" style="background:var(--amber)"></span>已到达，等待缺席方','<span class="sw" style="background:var(--amber)"></span>arrived, waiting for the absent rank'],
+    ['<span class="sw" style="background:var(--red)"></span>缺失 ack → 永远等不到','<span class="sw" style="background:var(--red)"></span>missing ack → never arrives'],
+    ['◆ <code>all_reduce(MIN)</code> @ 组2（prefetch_completion_sync_groups）· 每个 ack 一次 barrier',
+     '◆ <code>all_reduce(MIN)</code> @ set 2 (prefetch_completion_sync_groups) · one barrier per ack'],
+    ['选择场景：<b>每 batch 恒 1 ack</b> → 次数对齐、安全；<b>出错就 break</b> → ack 缺一个 → 组2 reduce 错位 → hang。',
+     'Pick a scenario: <b>one ack per batch</b> → counts aligned, safe; <b>break on error</b> → one ack missing → set-2 reduces misalign → hang.'],
+    ['<b style="color:var(--green)">正确</b>：即便某 batch 出错，<code>_page_transfer</code> 也<strong>继续循环、照常产 ack</strong> → 三个 rank 都产 3 个 ack。',
+     '<b style="color:var(--green)">Correct</b>: even if a batch errors, <code>_page_transfer</code> <strong>keeps looping and still emits the ack</strong> → all three ranks emit 3 acks.'],
+    ['<b style="color:var(--red)">错误（反面教材）</b>：rank2 在 batch2 出错就 <code>break</code> → 只产 2 个 ack，比别人少一个。',
+     '<b style="color:var(--red)">Wrong (anti-pattern)</b>: rank2 hits an error at batch2 and <code>break</code>s → emits only 2 acks, one fewer than the others.'],
+    ['▶ 正确（每 batch 恒 1 ack）','▶ Correct (one ack per batch)'],
+    ['▶ 错误（出错 break → ack 缺失）','▶ Wrong (break on error → missing ack)'],
+    ['每个 batch 恒产 1 个 ack（出错也产）→ <b>ack 数逐 rank 相等</b> → 组2 的 collective 一一对应 → 不会 hang。',
+     'Each batch always emits one ack (even on error) → <b>ack counts equal across ranks</b> → set-2 collectives match one-to-one → no hang.'],
+    ['✅ 安全：每 rank 都做了 3 次 reduce，次数严格相等，全部对齐完成','✅ Safe: every rank did 3 reduces, counts strictly equal, all aligned and complete'],
+    ['💥 HANG：组2 reduce 次数不一致（3/3/2）→ collective 永不返回','💥 HANG: set-2 reduce counts differ (3/3/2) → the collective never returns'],
+    // tab5 legend chips
+    ['<span class="sw" style="background:var(--blue)"></span>GPU 计算 / 插入中','<span class="sw" style="background:var(--blue)"></span>GPU compute / inserting'],
+    ['<span class="sw" style="background:var(--cyan)"></span>match 命中前缀','<span class="sw" style="background:var(--cyan)"></span>matched prefix'],
+    ['<span class="sw" style="background:var(--amber)"></span>各 rank 不一致（待 MIN 统一）','<span class="sw" style="background:var(--amber)"></span>diverged per rank (await MIN)'],
+    ['<span class="sw" style="background:var(--green)"></span>已提交 / 一致','<span class="sw" style="background:var(--green)"></span>committed / consistent'],
+    ['<span class="sw" style="background:var(--red)"></span>未命中 / 淘汰删除','<span class="sw" style="background:var(--red)"></span>miss / evicted'],
+    // tab5 static labels
+    ['<b>L3 持久化存储</b>（storage backend，3 个 rank 共享视图）',
+     '<b>L3 persistent storage</b> (storage backend, shared view across 3 ranks)'],
+    ['GPU 计算','GPU compute'],
+    ['◆ MIN 组1 · prefetch_hits_sync_groups · storage_hit_count','◆ MIN set 1 · prefetch_hits_sync_groups · storage_hit_count'],
+    ['◆ MIN 组2 · prefetch_completion_sync_groups · completed_tokens','◆ MIN set 2 · prefetch_completion_sync_groups · completed_tokens'],
+    // tab5 consistency flags
+    ['✓ 3 棵 host tree 同步插入 4 个 page（一致）','✓ all 3 host trees insert 4 pages in sync (consistent)'],
+    ['✓ 3 棵 host tree 同步删除（delete 一致）','✓ all 3 host trees delete in sync (consistent)'],
+    ['⚠ 查询长度不一致（4/3/4）→ 若各自建树，host tree 会发散','⚠ query lengths differ (4/3/4) → building trees independently diverges them'],
+    ['✓ 抓取范围统一 = 3 → match_prefix 逐 rank 一致','✓ fetch range unified = 3 → match_prefix identical per rank'],
+    ['⚠ 实际落盘不一致（3/3/2）→ 若各自插入，host tree 会发散','⚠ actual loads differ (3/3/2) → inserting independently diverges trees'],
+    ['✓ 插入长度统一 = 2 → 3 棵 host tree 完全一致','✓ insert length unified = 2 → all 3 host trees identical'],
+    // tab5 step captions
+    ['场景 <b>PP=3 × TP=4</b>：每个 PP rank 维护一棵 <b>L2 host radix tree</b>，共享底层 <b>L3 持久化存储</b>。跟踪两个请求，看 host tree 如何保持一致。',
+     'Scenario <b>PP=3 × TP=4</b>: each PP rank keeps an <b>L2 host radix tree</b> over a shared <b>L3 persistent storage</b>. We follow two requests and see how the trees stay consistent.'],
+    ['① <b>请求 A</b> 到达（需要 4 个 page 的前缀），3 个 PP rank 同时处理。',
+     '① <b>Request A</b> arrives (needs a 4-page prefix); all 3 PP ranks process it together.'],
+    ['① 查 L2 host tree → <b style="color:var(--red)">miss</b>；查 L3 → <b style="color:var(--red)">miss</b>（存储为空）。',
+     '① Query L2 host tree → <b style="color:var(--red)">miss</b>; query L3 → <b style="color:var(--red)">miss</b> (storage empty).'],
+    ['① 回退到 <b>GPU 前向计算</b>，生成这 4 个 page 的 KV。',
+     '① Fall back to <b>GPU forward compute</b> to produce the KV for these 4 pages.'],
+    ['① 计算结果写入 <b>L2 host radix tree</b> → 3 个 rank <code>insert</code> <strong style="color:var(--green)">相同</strong>的前缀 p0–p3。',
+     '① Results are written into the <b>L2 host radix tree</b> → all 3 ranks <code>insert</code> the <strong style="color:var(--green)">same</strong> prefix p0–p3.'],
+    ['① backup 线程把 L2 → <b>L3</b> 持久化（<code>write_backup</code> / <code>page_set</code>）。',
+     '① The backup thread persists L2 → <b>L3</b> (<code>write_backup</code> / <code>page_set</code>).'],
+    ['② host 内存压力 → L2 触发<strong style="color:var(--red)">淘汰</strong>（<code>evict_host</code>）。3 棵 host tree <b>完全一致</b> → 淘汰命中<strong>同一批节点</strong>；L3 仍保留。',
+     '② Host-memory pressure → L2 <strong style="color:var(--red)">eviction</strong> (<code>evict_host</code>). The 3 host trees are <b>identical</b> → eviction hits the <strong>same nodes</strong>; L3 keeps them.'],
+    ['③ <b>请求 B</b> 到达（复用 A 的前缀）。L2 host tree 已空 → <b style="color:var(--red)">L2 miss</b>，转向 L3。',
+     '③ <b>Request B</b> arrives (reuses A\u2019s prefix). The L2 host tree is empty → <b style="color:var(--red)">L2 miss</b>, fall through to L3.'],
+    ['③ <b>prefetch_thread</b> 各 rank 向 L3 查命中页数 → 结果可能<strong style="color:var(--amber)">不同</strong>（host 视图/内存差异）：4 / 3 / 4。',
+     '③ <b>prefetch_thread</b> on each rank queries L3 hit pages → results may <strong style="color:var(--amber)">differ</strong> (host view / memory): 4 / 3 / 4.'],
+    ['◆ <b>第一个 MIN</b> @ <code>prefetch_hits_sync_groups</code>（组1，gloo/CPU，含 TP环+PP环）：<code>all_reduce(MIN)</code> 统一查询长度 = <b>3</b>。',
+     '◆ <b>First MIN</b> @ <code>prefetch_hits_sync_groups</code> (set 1, gloo/CPU, TP+PP rings): <code>all_reduce(MIN)</code> unifies the query length = <b>3</b>.'],
+    ['③ <b>prefetch_io_aux_thread</b> 逐 batch 把 page 从 L3 拉回 L2（<code>_page_transfer</code>），每 batch 产 1 个 PrefetchAck。',
+     '③ <b>prefetch_io_aux_thread</b> pulls pages L3→L2 batch by batch (<code>_page_transfer</code>), one PrefetchAck per batch.'],
+    ['③ 逐页加载<strong style="color:var(--amber)">部分失败</strong>：rank2 第 3 页 <code>page_get</code> 未成功 → completed_tokens = 3 / 3 / 2。',
+     '③ Per-page load <strong style="color:var(--amber)">partially fails</strong>: rank2\u2019s 3rd page <code>page_get</code> fails → completed_tokens = 3 / 3 / 2.'],
+    ['◆ <b>第二个 MIN</b> @ <code>prefetch_completion_sync_groups</code>（组2，<b>独立 communicator</b>）：<code>all_reduce(MIN)</code> 统一 completed_tokens = <b>2</b>。',
+     '◆ <b>Second MIN</b> @ <code>prefetch_completion_sync_groups</code> (set 2, <b>independent communicator</b>): <code>all_reduce(MIN)</code> unifies completed_tokens = <b>2</b>.'],
+    ['③ 各 rank 只把统一的 <b>2 个 page</b> 插入 L2 host tree（<code>_insert_helper_host</code>）→ 3 棵 host tree <strong style="color:var(--green)">再次完全一致</strong>。',
+     '③ Each rank inserts only the unified <b>2 pages</b> into its L2 host tree (<code>_insert_helper_host</code>) → all 3 host trees are <strong style="color:var(--green)">identical again</strong>.'],
+    ['✅ 两套<strong>独立 gloo 组</strong>（组1 命中数、组2 完成数）+ 每 batch 恒 1 个 ack → 各 rank 对 host tree 的<strong>插入/删除完全一致</strong> → <b style="color:var(--green)">host radix tree 始终一致，后台 collective 不会死锁</b>。',
+     '✅ Two <strong>independent gloo group-sets</strong> (set 1 hit count, set 2 completed tokens) + exactly one ack per batch → every rank\u2019s <strong>inserts/deletes are identical</strong> → <b style="color:var(--green)">host radix trees stay consistent and background collectives never deadlock</b>.'],
+    // buttons
+    ['⏸ 暂停','⏸ Pause'],['▶ 播放','▶ Play'],['⟲ 重播','⟲ Replay'],['重置','Reset'],
+    ['▶ 1 套组（死锁）','▶ 1 group set (deadlock)'],['▶ 2 套组（安全）','▶ 2 group sets (safe)'],
+    // tab1 legend + tree title + init caption
+    ['<span class="sw" style="background:var(--amber)"></span>命中数被截断（不一致）','<span class="sw" style="background:var(--amber)"></span>hit count truncated (inconsistent)'],
+    ['<span class="sw" style="background:var(--blue)"></span>TP 组内 MIN 后','<span class="sw" style="background:var(--blue)"></span>after MIN within TP group'],
+    ['<span class="sw" style="background:var(--green)"></span>PP 组内 MIN 后（全局一致）','<span class="sw" style="background:var(--green)"></span>after MIN within PP group (global)'],
+    ['所有 24 个 rank 共享同一棵 radix tree','all 24 ranks share one radix tree'],
+    ['自动播放中…','auto-playing…'],
+    // tab1 captions
+    ['拓扑 <b>PP=3 × TP=8 = 24 个 rank</b>：每个 PP stage 下挂 8 个 TP rank。',
+     'Topology <b>PP=3 × TP=8 = 24 ranks</b>: each PP stage holds 8 TP ranks.'],
+    ['① 各 rank <span class="k">独立</span>向 L3 查询前缀命中。<b style="color:var(--amber)">注意 r10、r15 因 host 内存压力被截断</b>（6 / 7 页）。',
+     '① Each rank <span class="k">independently</span> queries L3 for prefix hits. <b style="color:var(--amber)">Note r10 & r15 are truncated by host-memory pressure</b> (6 / 7 pages).'],
+    ['② 若各 rank 按自己的命中数建 radix tree → 树高不一致 → 后续 PP 集合通信 <b style="color:var(--red)">shape mismatch → crash</b>。',
+     '② If each rank builds its radix tree from its own hit count → tree heights differ → next PP collective <b style="color:var(--red)">shape mismatch → crash</b>.'],
+    ['③ 第一步：在 <span class="k">TP 组（每一行 8 个 rank）</span>内 <code>all_reduce(MIN)</code>。',
+     '③ Step 1: <code>all_reduce(MIN)</code> within each <span class="k">TP group (a row of 8 ranks)</span>.'],
+    ['③ TP 组归约后：<b>每一行变得一致</b>（PP0=8, PP1=6, PP2=8 = 各行最小值）。',
+     '③ After TP reduce: <b>each row is uniform</b> (PP0=8, PP1=6, PP2=8 = per-row min).'],
+    ['④ 第二步：在 <span class="k">PP 组（每一列 3 个 rank）</span>内 <code>all_reduce(MIN)</code> → 收敛到全局最小值。',
+     '④ Step 2: <code>all_reduce(MIN)</code> within each <span class="k">PP group (a column of 3 ranks)</span> → converge to the global minimum.'],
+    ['④ PP 组归约后：<b style="color:var(--green)">全部 24 个 rank 命中数 = 6</b>（最长公共前缀）。',
+     '④ After PP reduce: <b style="color:var(--green)">all 24 ranks hit = 6</b> (longest common prefix).'],
+    ['⑤ 所有 rank 都只 prefetch / 建树到 6 → <span style="color:var(--green)">24 个 rank 的 radix tree 完全一致 ✓</span>',
+     '⑤ Every rank prefetches / builds the tree only up to 6 → <span style="color:var(--green)">all 24 radix trees are identical ✓</span>'],
+    // tab2 legend + note + groups + init + captions + banners
+    ['<span class="sw" style="background:var(--purple)"></span><b>prefetch_thread</b>（独立后台线程）· reduce(storage_hit_count)','<span class="sw" style="background:var(--purple)"></span><b>prefetch_thread</b> (independent background thread) · reduce(storage_hit_count)'],
+    ['<span class="sw" style="background:var(--cyan)"></span><b>prefetch_sync_thread</b>（独立后台线程）· reduce(completed_tokens)','<span class="sw" style="background:var(--cyan)"></span><b>prefetch_sync_thread</b> (independent background thread) · reduce(completed_tokens)'],
+    ['每个 cell = 1 个 rank，内含 2 个独立后台线程（小圆点 ●A ●B）。每一行是一个 <b>TP communicator</b>，每一列是一个 <b>PP communicator</b>。',
+     'Each cell = 1 rank, holding 2 independent background threads (dots ●A ●B). Each row is a <b>TP communicator</b>, each column a <b>PP communicator</b>.'],
+    ['<b>prefetch_hits_sync_groups</b><br>命中页数归约组（含 TP 环 + PP 环）<br><span style="font-size:11px">reduce(storage_hit_count)</span>',
+     '<b>prefetch_hits_sync_groups</b><br>hit-count reduce set (TP rings + PP rings)<br><span style="font-size:11px">reduce(storage_hit_count)</span>'],
+    ['<b>prefetch_completion_sync_groups</b><br>完成 token 归约组（含 TP 环 + PP 环）<br><span style="font-size:11px">reduce(completed_tokens)</span>',
+     '<b>prefetch_completion_sync_groups</b><br>completed-token reduce set (TP rings + PP rings)<br><span style="font-size:11px">reduce(completed_tokens)</span>'],
+    ['选择场景：用 <b>1 套组</b> 会死锁，用 <b>2 套组</b> 则安全。','Pick a scenario: <b>1 group set</b> deadlocks, <b>2 group sets</b> are safe.'],
+    ['只有 <b>1 套组</b>：prefetch_thread(A) 与 prefetch_sync_thread(B) 共用同一个 communicator 集。',
+     'Only <b>1 group set</b>: prefetch_thread(A) and prefetch_sync_thread(B) share the same communicator set.'],
+    ['两个后台线程<b>独立调度、顺序不定</b>：同一个 TP 环里，有的 rank 先发 A，有的先发 B。',
+     'The two background threads are <b>scheduled independently, order unpredictable</b>: within one TP ring some ranks post A first, others post B first.'],
+    ['同一个 communicator 上各 rank 提交的 collective <b style="color:var(--red)">不是同一个</b>（A 与 B 错位）→ rendezvous 永远配不上。',
+     'On the same communicator the collectives submitted by different ranks are <b style="color:var(--red)">not the same</b> (A vs B misaligned) → rendezvous never matches.'],
+    ['只要任一 communicator 上 A/B 交错，该环就死锁 → 全局 PP/TP 通信连环卡住。',
+     'If A/B interleave on any communicator, that ring deadlocks → all PP/TP communication hangs in a chain.'],
+    ['💥 DEADLOCK — 整个 24-rank job 卡死','💥 DEADLOCK — the whole 24-rank job hangs'],
+    ['用 <b>2 套独立组</b>：<b style="color:var(--purple)">A 永远走 prefetch_hits_sync_groups</b>，<b style="color:var(--cyan)">B 永远走 prefetch_completion_sync_groups</b>。',
+     'With <b>2 independent group sets</b>: <b style="color:var(--purple)">A always uses prefetch_hits_sync_groups</b>, <b style="color:var(--cyan)">B always uses prefetch_completion_sync_groups</b>.'],
+    ['第一波：所有 rank 的 <b>prefetch_thread</b> 只在 <code>prefetch_hits_sync_groups</code> 上提交 A → 序列一致。',
+     'Wave 1: every rank\u2019s <b>prefetch_thread</b> posts A only on <code>prefetch_hits_sync_groups</code> → consistent order.'],
+    ['✓ TP 环 + PP 环上 A 全部到齐 → 第一波归约完成。','✓ A arrives on every TP ring + PP ring → wave 1 reduce done.'],
+    ['第二波：所有 rank 的 <b>prefetch_sync_thread</b> 只在 <code>prefetch_completion_sync_groups</code> 上提交 B → 序列一致。',
+     'Wave 2: every rank\u2019s <b>prefetch_sync_thread</b> posts B only on <code>prefetch_completion_sync_groups</code> → consistent order.'],
+    ['每个 communicator 上的 collective 序列在所有 rank <b style="color:var(--green)">完全一致</b>（A→组1、B→组2 不交叉）→ 不会死锁。',
+     'The collective sequence on each communicator is <b style="color:var(--green)">identical across ranks</b> (A→set1, B→set2, never crossing) → no deadlock.'],
+    ['✅ 安全 — 24 个 rank 全部对齐完成','✅ Safe — all 24 ranks aligned and complete'],
+    // tab3 titles / conduits / flow-note / lane hint / captions / formers
+    ['③ 主 PP 流水线执行<strong>时序</strong> <span class="tag gpu">NCCL · GPU</span> <span style="color:var(--muted);font-size:11px;">时序连续、错峰流动，<strong style="color:var(--green)">不被后台 prefetch 同步打断</strong></span>',
+     '③ Main PP pipeline execution <strong>timing</strong> <span class="tag gpu">NCCL · GPU</span> <span style="color:var(--muted);font-size:11px;">continuous, staggered flow, <strong style="color:var(--green)">never interrupted by background prefetch sync</strong></span>'],
+    ['↑ 流水线跑的正是②组好的 <strong>mb0→mb3</strong>，沿 stage0→1→2 错峰对角推进',
+     '↑ The pipeline runs exactly the <strong>mb0→mb3</strong> composed in ②, advancing diagonally stage0→1→2'],
+    ['▲ 组好的 <b>batch &amp; micro-batch 顺序</b> 喂给流水线（内容）','▲ The composed <b>batch &amp; micro-batch order</b> feeds the pipeline (content)'],
+    ['② 三个 PP rank 用<strong>同一个 storage hit</strong> 组 batch（内容必须逐 rank 一致）',
+     '② The three PP ranks compose the batch from <strong>the same storage hit</strong> (content must match per rank)'],
+    ['▲ <code>all_reduce(MIN)</code> 输出统一值 <b>6</b> → 决定 batch size','▲ <code>all_reduce(MIN)</code> outputs the unified value <b>6</b> → determines batch size'],
+    ['① 异步 prefetch 查询 → <code>all_reduce(MIN)</code> <span class="tag cpu">gloo · CPU 后台线程</span>',
+     '① Async prefetch query → <code>all_reduce(MIN)</code> <span class="tag cpu">gloo · CPU background thread</span>'],
+    ['（等待②组好的 batch…）','(waiting for batch from ②…)'],
+    ['① 三个 PP rank 的 prefetch 查询<strong>异步发起</strong>（到达时刻不同）。','① The three PP ranks issue prefetch queries <strong>asynchronously</strong> (different arrival times).'],
+    ['① 先到的 rank 在 <span class="k">gloo CPU 后台线程</span>上<b style="color:var(--amber)">等待对齐</b>（不占 GPU）。',
+     '① Earlier ranks <b style="color:var(--amber)">wait to align</b> on the <span class="k">gloo CPU background thread</span> (no GPU use).'],
+    ['① <b style="color:var(--amber)">pp2 最慢</b>到达 → <code>all_reduce(MIN)</code> 把 8/6/7 <strong style="color:var(--green)">统一成 6</strong>。',
+     '① <b style="color:var(--amber)">pp2 is slowest</b> to arrive → <code>all_reduce(MIN)</code> unifies 8/6/7 <strong style="color:var(--green)">into 6</strong>.'],
+    ['② 统一后的 <b>storage hit = 6</b> 下发给各 rank 调度器 → 决定<strong>已缓存前缀长度 / batch size / micro-batch 顺序</strong>（mb0→mb3）。',
+     '② The unified <b>storage hit = 6</b> goes to each rank\u2019s scheduler → determines <strong>cached prefix length / batch size / micro-batch order</strong> (mb0→mb3).'],
+    ['③ 三个 rank 因拿到<strong style="color:var(--green)">同一个 6</strong> 而组出<strong style="color:var(--green)">完全一致的 batch 与 mb 顺序</strong>，喂给 PP 流水线；执行时序连续不被打断。<br><span style="color:var(--red)">⚠ 若 storage hit 不统一 → batch/mb 顺序逐 rank 发散 → PP 调度错位、卡死。</span>',
+     '③ Because all three ranks get <strong style="color:var(--green)">the same 6</strong>, they compose <strong style="color:var(--green)">identical batches and mb order</strong> fed to the PP pipeline; timing stays continuous.<br><span style="color:var(--red)">⚠ If storage hit weren\u2019t unified → batch/mb order diverges per rank → PP scheduling mismatch & hang.</span>'],
+    // formers
+    ['调度器 · PP rank 0','Scheduler · PP rank 0'],['调度器 · PP rank 1','Scheduler · PP rank 1'],['调度器 · PP rank 2','Scheduler · PP rank 2'],
+    ['已缓存前缀 storage hit = ','cached prefix storage hit = '],
+    [' 页 → 决定 batch 组成',' pages → determines batch'],
+    ['✓ batch & mb 顺序一致','✓ identical batch & mb order'],
+    // mesh labels
+    ['<b>PP stage 0</b><br>(TP 组)','<b>PP stage 0</b><br>(TP group)'],
+    ['<b>PP stage 1</b><br>(TP 组)','<b>PP stage 1</b><br>(TP group)'],
+    ['<b>PP stage 2</b><br>(TP 组)','<b>PP stage 2</b><br>(TP group)'],
+    ['PP 组(列)<br>每列跨 3 个 stage →','PP groups (cols)<br>each spans 3 stages →'],
+    // tab4 flow boxes
+    ['调度器 Scheduler <span class="pin">主线程</span>','Scheduler <span class="pin">main thread</span>'],
+    ['发起 prefetch 请求（writeback / load）','Issues prefetch requests (writeback / load)'],
+    ['▼ <b>prefetch_queue</b>（PrefetchOperation）','▼ <b>prefetch_queue</b> (PrefetchOperation)'],
+    ['① prefetch_thread <span class="pin">storage-hit 线程</span>','① prefetch_thread <span class="pin">storage-hit thread</span>'],
+    ['<code>_storage_hit_query()</code> 查询 L3 命中页数；命中足够→放 prefetch_buffer，不足→prefetch_revoke_queue',
+     '<code>_storage_hit_query()</code> queries L3 hit pages; enough hits → prefetch_buffer, too few → prefetch_revoke_queue'],
+    ['◆ all_reduce(MIN) storage_hit_count <small>@ prefetch_hits_sync_groups（组1，gloo/CPU，含 TP 环 + PP 环）</small>',
+     '◆ all_reduce(MIN) storage_hit_count <small>@ prefetch_hits_sync_groups (set 1, gloo/CPU, TP rings + PP rings)</small>'],
+    ['▼ <b>prefetch_buffer</b>','▼ <b>prefetch_buffer</b>'],
+    ['② prefetch_io_aux_thread <span class="pin">IO 加载线程</span>','② prefetch_io_aux_thread <span class="pin">IO load thread</span>'],
+    ['<code>_page_transfer()</code> 逐 batch 把页从 L3 读入 host；累加 <b>completed_tokens</b>；<b>每个 storage batch 产生 1 个 PrefetchAck</b>（出错也照常产生）',
+     '<code>_page_transfer()</code> loads pages L3→host batch by batch; accumulates <b>completed_tokens</b>; <b>each storage batch emits exactly 1 PrefetchAck</b> (even on error)'],
+    ['▼ <b>prefetch_sync_queue</b>（PrefetchAck）','▼ <b>prefetch_sync_queue</b> (PrefetchAck)'],
+    ['③ prefetch_sync_thread <span class="pin">completion-token 线程</span>','③ prefetch_sync_thread <span class="pin">completion-token thread</span>'],
+    ['对每个 ack 的 <b>completed_tokens</b> 做归约','Reduces <b>completed_tokens</b> of every ack'],
+    ['◆ all_reduce(MIN) completed_tokens <small>@ prefetch_completion_sync_groups（组2，gloo/CPU，含 TP 环 + PP 环）</small>',
+     '◆ all_reduce(MIN) completed_tokens <small>@ prefetch_completion_sync_groups (set 2, gloo/CPU, TP rings + PP rings)</small>'],
+    ['▼ <b>ack_prefetch_queue</b>','▼ <b>ack_prefetch_queue</b>'],
+    ['调度器写入 host radix tree','Scheduler inserts into host radix tree'],
+    ['只插入 <b>completed_tokens</b> 长度的前缀 → <code>_insert_helper_host()</code>','Inserts only the <b>completed_tokens</b>-long prefix → <code>_insert_helper_host()</code>'],
+    ['为什么 MIN(storage_hit) 一致？','Why does MIN(storage_hit) ensure consistency?'],
+    ['各 rank 命中可能不同（host 内存截断、L3 视图差异）。MIN 取<b>最长公共可命中前缀</b> → 所有 rank <b>抓取范围一致</b>，不会各抓不同长度。',
+     'Hits may differ per rank (host-mem truncation, L3 view differences). MIN takes the <b>longest common hittable prefix</b> → every rank <b>fetches the same range</b>, never different lengths.'],
+    ['为什么 MIN(completed_tokens) 一致？','Why does MIN(completed_tokens) ensure consistency?'],
+    ['即便抓取范围一致，实际逐页加载仍可能<b>部分失败</b>（<code>page_get</code> 返回 n≠batch）。MIN 只提交<b>所有 rank 都成功落盘的最长公共前缀</b> → 写入 host tree 的长度逐 rank 相同。',
+     'Even with the same fetch range, per-page loads can <b>partially fail</b> (<code>page_get</code> returns n≠batch). MIN commits only the <b>longest common prefix every rank loaded successfully</b> → identical insert length per rank.'],
+    ['为什么不会 hang？','Why no hang?'],
+    ['每个 storage batch <b>都产生且仅产生一个 PrefetchAck</b>（即使出错也照常产生）→ 每个 rank 参与的 reduce <b>次数严格相等</b>，collective 一一对齐。两个 MIN 一起保证：<b>插入 host tree 的前缀逐 rank 完全相同 → 树一致</b>。',
+     'Each storage batch <b>emits exactly one PrefetchAck</b> (even on error) → every rank joins the <b>same number of reduces</b>, collectives align one-to-one. The two MINs together guarantee: <b>the prefix inserted into the host tree is identical per rank → trees are consistent</b>.'],
+    ['两个 MIN 同步点（组1 命中数、组2 完成数）+ 每 batch 恒定 1 个 ack，共同保证 PP 各 rank 的 host radix tree 严格一致。',
+     'Two MIN sync points (set 1 = hit count, set 2 = completed tokens) + exactly one ack per batch together keep every PP rank\u2019s host radix tree strictly identical.'],
+    // tab4 animated step captions
+    ['沿数据流向下逐步点亮：两个 MIN 同步点 + 每 batch 恒定 1 个 ack。',
+     'Light up step by step along the data flow: two MIN sync points + exactly one ack per batch.'],
+    ['调度器主线程把 prefetch 请求（writeback / load）放入队列，触发后台流水线。',
+     'The scheduler main thread enqueues a prefetch request (writeback / load), kicking off the background pipeline.'],
+    ['<b>PrefetchOperation</b> 入队 <code>prefetch_queue</code>，交给后台线程处理。',
+     'A <b>PrefetchOperation</b> enters <code>prefetch_queue</code>, handed to the background threads.'],
+    ['① <b>prefetch_thread</b> 调 <code>_storage_hit_query()</code> 查询 L3 命中页数（各 rank 可能不同）。',
+     '① <b>prefetch_thread</b> calls <code>_storage_hit_query()</code> to query L3 hit pages (may differ per rank).'],
+    ['◆ <b style="color:var(--amber)">第一个 MIN</b>：在 <code>prefetch_hits_sync_groups</code>（组1）对 storage_hit_count 取最小 → <b>抓取范围逐 rank 一致</b>。',
+     '◆ <b style="color:var(--amber)">First MIN</b>: take the min of storage_hit_count on <code>prefetch_hits_sync_groups</code> (set 1) → <b>the fetch range is identical per rank</b>.'],
+    ['命中足够的请求落入 <code>prefetch_buffer</code>，进入实际 IO 加载。',
+     'Requests with enough hits drop into <code>prefetch_buffer</code> for the actual IO load.'],
+    ['② <b>prefetch_io_aux_thread</b> 用 <code>_page_transfer()</code> 逐 batch 把页 L3→host；<b>每个 batch 恒产生 1 个 PrefetchAck</b>（出错也产生）。',
+     '② <b>prefetch_io_aux_thread</b> uses <code>_page_transfer()</code> to move pages L3→host batch by batch; <b>each batch always emits exactly one PrefetchAck</b> (even on error).'],
+    ['每个 batch 的 <b>PrefetchAck</b> 入队 <code>prefetch_sync_queue</code>。',
+     'Each batch\u2019s <b>PrefetchAck</b> enters <code>prefetch_sync_queue</code>.'],
+    ['③ <b>prefetch_sync_thread</b> 对每个 ack 的 <b>completed_tokens</b> 做归约。',
+     '③ <b>prefetch_sync_thread</b> reduces the <b>completed_tokens</b> of every ack.'],
+    ['◆ <b style="color:var(--green)">第二个 MIN</b>：在 <code>prefetch_completion_sync_groups</code>（组2）对 completed_tokens 取最小 → <b>真正落盘前缀逐 rank 一致</b>。',
+     '◆ <b style="color:var(--green)">Second MIN</b>: take the min of completed_tokens on <code>prefetch_completion_sync_groups</code> (set 2) → <b>the actually-loaded prefix is identical per rank</b>.'],
+    ['统一后的结果入队 <code>ack_prefetch_queue</code> 回到调度器。',
+     'The unified result enters <code>ack_prefetch_queue</code> back to the scheduler.'],
+    ['调度器只插入 <b>completed_tokens</b> 长度的前缀 → <code>_insert_helper_host()</code>。每 batch 恒 1 个 ack，<b>reduce 次数严格相等 → 不会 hang</b>。',
+     'The scheduler inserts only the <b>completed_tokens</b>-long prefix → <code>_insert_helper_host()</code>. One ack per batch means <b>equal reduce counts → no hang</b>.'],
+    ['✅ 闭环：两个 MIN（组1 命中数 + 组2 完成数）+ 每 batch 1 个 ack → <b style="color:var(--green)">PP 各 rank 的 host radix tree 严格一致</b>。',
+     '✅ Closed loop: two MINs (set 1 hit count + set 2 completed tokens) + one ack per batch → <b style="color:var(--green)">every PP rank\u2019s host radix tree is strictly identical</b>.'],
+  ];
+
+  const SEL = ['header h1','header p','.tab','.tree-title','.caption','.banner','.legend .chip',
+    '#scene2 .note','.grp','.t3-title','.clabel','.flow-note','.lane-hint','.ctl',
+    '.pp-label','.pp-foot .lab','.former h5','.former .hitbox .ht1','.former .hitbox .ht2','.former .chk',
+    '.tbox .tname','.tbox .tdesc','.tarrow','.minnode','.whycard h4','.whycard p',
+    '.gpu-badge span','.l3lab','.syncbadge','.consist-flag','#scene6 .note','.barlabel'].join(',');
+
+  const tmp=document.createElement('div');
+  const strip=h=>{ tmp.innerHTML=h; return tmp.textContent.replace(/\s+/g,' ').trim(); };
+  const EN={}, ZH={};
+  PAIRS.forEach(([zh,en])=>{ EN[strip(zh)]=en; ZH[strip(en)]=zh; });
+
+  let LANG='zh';
+  // runtime helper for dynamic strings that contain interpolated values
+  // (cannot be matched by the static dictionary). Reads the live LANG.
+  window.TR=(zh,en)=> LANG==='en' ? en : zh;
+  let mo=null;
+  let suppress=false;   // re-entrancy guard: ignore mutations we cause ourselves
+  function translateEl(el){
+    const k=strip(el.innerHTML);
+    const next = LANG==='en' ? EN[k] : ZH[k];
+    // only write when there is a real change, otherwise we churn the DOM
+    if(next!==undefined && next!==el.innerHTML) el.innerHTML=next;
+  }
+  function translateAll(){
+    suppress=true;
+    document.querySelectorAll(SEL).forEach(translateEl);
+    if(mo) mo.takeRecords();   // drop the records our own writes just generated
+    suppress=false;
+  }
+
+  window.toggleLang=function(){
+    LANG = LANG==='zh' ? 'en' : 'zh';
+    document.getElementById('langBtn').textContent = LANG==='zh' ? 'EN' : '中文';
+    document.documentElement.lang = LANG==='zh' ? 'zh-CN' : 'en';
+    translateAll();
+  };
+
+  // keep dynamic captions translated as JS rewrites them
+  mo=new MutationObserver(muts=>{
+    if(suppress) return;       // skip the mutations our own translations produced
+    suppress=true;
+    muts.forEach(m=>{
+      const tgt = m.target.nodeType===1 ? m.target : m.target.parentElement;
+      if(!tgt) return;
+      const c = tgt.closest && tgt.closest(SEL);
+      if(c) translateEl(c);
+    });
+    mo.takeRecords();
+    suppress=false;
+  });
+  mo.observe(document.body,{subtree:true,childList:true,characterData:true});
+})();
+</script>
+</body>
+</html>
+<style>#langBtn,header,.tabs{display:none!important;}body{background:#0e1117;}.wrap{padding-top:10px;}</style>
+<script>
+(function(){
+  // English-only, single-tab embed: reuse all original JS
+  try{ if(window.toggleLang) toggleLang(); }catch(e){}   // zh -> en
+  var TAB="story";
+  var btn=document.querySelector('.tab[data-tab="'+TAB+'"]');
+  if(btn){ btn.click(); }
+})();
+</script>
diff --git a/public/images/blog/pp_hicache_consistency/hicache_pp_animation_en_skew.html b/public/images/blog/pp_hicache_consistency/hicache_pp_animation_en_skew.html
new file mode 100644
index 000000000..c80208beb
--- /dev/null
+++ b/public/images/blog/pp_hicache_consistency/hicache_pp_animation_en_skew.html
@@ -0,0 +1,1559 @@
+<!DOCTYPE html>
+<html lang="zh-CN">
+<head>
+<meta charset="UTF-8" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<title>HiCache × PP=3 · TP=8：树一致性 & 防死锁 动画</title>
+<style>
+  :root{
+    --bg:#0e1117; --panel:#161b22; --panel2:#1c2330; --line:#30363d;
+    --text:#e6edf3; --muted:#8b949e;
+    --blue:#58a6ff; --green:#3fb950; --red:#f85149; --amber:#d29922;
+    --purple:#bc8cff; --cyan:#56d4dd;
+  }
+  *{box-sizing:border-box;}
+  body{
+    margin:0; background:radial-gradient(1200px 600px at 50% -10%, #18202c, var(--bg));
+    color:var(--text); font-family:-apple-system,BlinkMacSystemFont,"Segoe UI","PingFang SC","Microsoft YaHei",sans-serif;
+    line-height:1.5;
+  }
+  #langBtn{ position:fixed; top:14px; right:16px; z-index:50; background:var(--panel2); border:1px solid var(--blue);
+    color:var(--blue); padding:7px 14px; border-radius:999px; cursor:pointer; font-size:13px; font-weight:600; }
+  #langBtn:hover{ background:var(--blue); color:#04101f; }
+  header{ text-align:center; padding:20px 16px 4px; }
+  header h1{ margin:0 0 4px; font-size:21px; }
+  header p{ margin:0; color:var(--muted); font-size:13px; }
+  .tabs{ display:flex; gap:8px; justify-content:center; margin:16px auto 8px; flex-wrap:wrap; }
+  .tab{ background:var(--panel); border:1px solid var(--line); color:var(--text);
+    padding:9px 16px; border-radius:999px; cursor:pointer; font-size:14px; transition:all .15s; }
+  .tab.active{ background:var(--blue); color:#04101f; border-color:var(--blue); font-weight:600; }
+  .wrap{ max-width:1120px; margin:0 auto; padding:0 16px 60px; }
+  .scene{ background:var(--panel); border:1px solid var(--line); border-radius:14px; padding:18px; position:relative; }
+  .hidden{ display:none; }
+  .controls{ display:flex; gap:10px; justify-content:center; align-items:center; margin:14px 0 4px; flex-wrap:wrap; }
+  button.ctl{ background:var(--panel2); border:1px solid var(--line); color:var(--text);
+    padding:8px 16px; border-radius:8px; cursor:pointer; font-size:14px; }
+  button.ctl:hover{ border-color:var(--blue); }
+  button.ctl.primary{ background:var(--green); color:#04140a; border-color:var(--green); font-weight:600; }
+  button.ctl.alt{ background:var(--red); color:#1a0606; border-color:var(--red); font-weight:600; }
+  .caption{ text-align:center; min-height:44px; margin:10px auto 0; max-width:900px; font-size:15px; }
+  .caption .k{ color:var(--cyan); font-weight:600; }
+  code{ background:#0d1117; padding:1px 6px; border-radius:4px; border:1px solid var(--line); color:var(--cyan); font-size:12px; }
+  .legend{ display:flex; gap:18px; justify-content:center; font-size:12px; color:var(--muted); flex-wrap:wrap; margin-bottom:10px; }
+  .chip{ display:inline-flex; align-items:center; gap:6px; }
+  .sw{ width:14px; height:14px; border-radius:4px; display:inline-block; }
+
+  /* ---------- mesh ---------- */
+  .mesh-head{ display:flex; align-items:center; gap:8px; margin-left:96px; margin-bottom:4px; }
+  .tp-hdr{ flex:1; display:flex; gap:6px; }
+  .tp-hdr .th{ flex:1; text-align:center; font-size:11px; color:var(--muted); }
+  .pp-row{ display:flex; align-items:center; gap:8px; margin-bottom:6px; }
+  .pp-label{ width:88px; font-size:12px; color:var(--muted); text-align:right; line-height:1.2; }
+  .pp-label b{ color:var(--text); }
+  .row-cells{ flex:1; display:flex; gap:6px; border:2px solid transparent; border-radius:10px; padding:3px; transition:border-color .3s, box-shadow .3s; }
+  .row-cells.ring-a{ border-color:var(--purple); box-shadow:0 0 12px rgba(188,140,255,.25); }
+  .row-cells.ring-b{ border-color:var(--cyan); box-shadow:0 0 12px rgba(86,212,221,.25); }
+  .row-cells.ring-bad{ border-color:var(--red); box-shadow:0 0 12px rgba(248,81,73,.3); }
+  .cell{
+    flex:1; height:54px; border-radius:8px; background:#0d1117; border:1px solid var(--line);
+    display:flex; flex-direction:column; align-items:center; justify-content:center; gap:1px;
+    transition:background .3s, border-color .3s, transform .15s, box-shadow .15s; position:relative;
+  }
+  .cell .v{ font-size:18px; font-weight:800; color:var(--muted); transition:color .3s; }
+  .cell .v small{ font-size:9px; font-weight:500; }
+  .cell .rk{ font-size:9px; color:#5b6470; }
+  .cell.varied .v{ color:var(--amber); }
+  .cell.tpmin{ background:linear-gradient(180deg,#143055,#102844); border-color:var(--blue); }
+  .cell.tpmin .v{ color:#cfe6ff; }
+  .cell.gmin{ background:linear-gradient(180deg,#0f3a1d,#0e2c18); border-color:var(--green); }
+  .cell.gmin .v{ color:#c4f7d4; }
+  .cell.sweep{ transform:translateY(-3px); box-shadow:0 0 14px rgba(88,166,255,.55); border-color:var(--blue); }
+  .cell.bad{ background:linear-gradient(180deg,#3a1414,#2a1010); border-color:var(--red); }
+  .cell.bad .v{ color:#ffd4d0; }
+  .cell.dim{ opacity:.35; }
+  /* thread dots */
+  .tdots{ display:flex; gap:5px; margin-top:1px; }
+  .td{ width:9px; height:9px; border-radius:50%; border:1px solid var(--line); background:#0d1117; transition:all .25s; }
+  .td.a{ border-color:var(--purple); }
+  .td.b{ border-color:var(--cyan); }
+  .td.a.on{ background:var(--purple); box-shadow:0 0 8px var(--purple); }
+  .td.b.on{ background:var(--cyan); box-shadow:0 0 8px var(--cyan); }
+  .td.done{ background:var(--green); border-color:var(--green); box-shadow:0 0 6px var(--green); }
+  .td.dead{ background:var(--red); border-color:var(--red); box-shadow:0 0 6px var(--red); }
+
+  /* pp-group footer (columns) */
+  .pp-foot{ display:flex; align-items:center; gap:8px; margin-top:6px; }
+  .pp-foot .lab{ width:88px; font-size:11px; color:var(--muted); text-align:right; }
+  .pp-foot .cols{ flex:1; display:flex; gap:6px; padding:0 3px; }
+  .pp-foot .col{ flex:1; height:20px; border-radius:6px; border:1px dashed var(--line); font-size:9px;
+    color:#5b6470; display:flex; align-items:center; justify-content:center; transition:all .3s; }
+  .pp-foot .col.ring-a{ border-color:var(--purple); color:#e3d3ff; }
+  .pp-foot .col.ring-b{ border-color:var(--cyan); color:#cdf6fa; }
+  .pp-foot .col.ring-bad{ border-color:var(--red); color:#ffd4d0; }
+
+  /* shared tree (tab1) */
+  .tree-box{ margin-top:14px; display:flex; flex-direction:column; align-items:center; }
+  .tree-title{ font-size:12px; color:var(--muted); margin-bottom:6px; }
+  .tree{ display:flex; flex-direction:column; align-items:center; gap:4px; min-height:30px; }
+  .tnode{ width:220px; height:20px; border-radius:5px; background:#0d1117; border:1px solid var(--line);
+    display:flex; align-items:center; justify-content:center; font-size:11px; color:var(--muted);
+    opacity:0; transform:translateY(-6px); transition:opacity .25s, transform .25s; }
+  .tnode.show{ opacity:1; transform:none; background:linear-gradient(90deg,#0f3a1d,#196b32); border-color:var(--green); color:#c4f7d4; }
+
+  /* groups panel (tab2) */
+  .groups{ display:flex; gap:16px; justify-content:center; margin-top:12px; flex-wrap:wrap; }
+  .grp{ border:1px dashed var(--line); border-radius:10px; padding:8px 14px; font-size:12px; color:var(--muted);
+    min-width:230px; text-align:center; transition:all .3s; }
+  .grp b{ color:var(--text); }
+  .grp.g1.hot{ border-color:var(--purple); color:#e3d3ff; box-shadow:0 0 14px rgba(188,140,255,.25); }
+  .grp.g2.hot{ border-color:var(--cyan); color:#cdf6fa; box-shadow:0 0 14px rgba(86,212,221,.25); }
+  .banner{ text-align:center; font-weight:800; font-size:18px; min-height:24px; margin-top:8px; }
+  .banner.ok{ color:var(--green); } .banner.bad{ color:var(--red); }
+  .note{ font-size:12px; color:var(--muted); text-align:center; margin-top:4px; }
+
+  /* ---------- tab3: async skew x MIN ---------- */
+  .t3-section{ background:var(--panel2); border:1px solid var(--line); border-radius:12px; padding:10px 12px; margin-bottom:14px; }
+  .t3-title{ font-size:13px; margin-bottom:8px; display:flex; align-items:center; gap:8px; }
+  .t3-title .tag{ font-size:10px; padding:2px 8px; border-radius:999px; border:1px solid var(--line); color:var(--muted); }
+  .t3-title .tag.gpu{ border-color:var(--blue); color:#cfe6ff; }
+  .t3-title .tag.cpu{ border-color:var(--amber); color:#ffe2ab; }
+  /* top pipeline */
+  .pipe{ position:relative; }
+  .lane{ position:relative; height:34px; margin:6px 0; border-radius:8px; background:#0d1117;
+    border:1px solid var(--line); overflow:hidden; }
+  .lane .lname{ position:absolute; left:8px; top:50%; transform:translateY(-50%); font-size:11px; color:var(--muted); z-index:3; }
+  .mb{ position:absolute; top:6px; height:22px; width:52px; border-radius:6px; z-index:2;
+    display:flex; align-items:center; justify-content:center; font-size:10px; font-weight:700;
+    opacity:0; color:#c4f7d4; border:1px solid var(--green);
+    background:linear-gradient(180deg,#0f3a1d,#15311f); transition:opacity .18s; }
+  .lane-hint{ position:absolute; right:10px; top:50%; transform:translateY(-50%); font-size:10px; color:#46505e; z-index:1; }
+  .flow-note{ font-size:11px; color:var(--green); text-align:right; margin-top:2px; }
+
+  /* bottom sync area */
+  .sync-area{ position:relative; height:170px; border-radius:8px; background:#0d1117; border:1px solid var(--line); overflow:hidden; }
+  .barrier{ position:absolute; top:6px; bottom:6px; width:0; border-left:2px dashed var(--amber); z-index:2; }
+  .barrier .blabel{ position:absolute; top:-2px; left:8px; font-size:11px; color:var(--amber); white-space:nowrap; }
+  .barrier.fire{ border-left-color:var(--green); box-shadow:-2px 0 18px rgba(63,185,80,.5); }
+  .slane{ position:absolute; left:0; right:0; height:1px; border-top:1px dashed #1f2733; }
+  .slabel{ position:absolute; left:8px; font-size:11px; color:var(--muted); z-index:3; transform:translateY(-50%); }
+  .pkt{ position:absolute; height:30px; width:108px; border-radius:8px; z-index:3; transform:translateY(-50%);
+    display:flex; align-items:center; justify-content:center; gap:6px; font-size:11px; font-weight:700;
+    border:1px solid var(--line); background:#11161f; transition:background .25s, border-color .25s, box-shadow .25s; }
+  .pkt .hv{ font-size:14px; }
+  .pkt.travel{ border-color:var(--blue); color:#cfe6ff; }
+  .pkt.wait{ border-color:var(--amber); color:#ffe2ab; background:#1d1808; animation:pulse 1s infinite; }
+  .pkt.unified{ border-color:var(--green); color:#c4f7d4; background:linear-gradient(180deg,#0f3a1d,#0e2c18); box-shadow:0 0 12px rgba(63,185,80,.35); }
+  .clock{ position:absolute; right:10px; top:8px; font-size:11px; color:var(--muted); z-index:4; }
+  /* causal arrows + batch formers */
+  .t3-conduit{ position:relative; height:38px; margin:-2px 0 8px; display:flex; align-items:center; justify-content:center; }
+  .t3-conduit .clabel{ font-size:12px; color:var(--muted); transition:color .3s; z-index:2; background:var(--panel); padding:0 8px; }
+  .t3-conduit.hot .clabel{ color:var(--green); }
+  .t3-conduit::before{ content:""; position:absolute; left:50%; top:4px; bottom:4px; width:2px; transform:translateX(-50%);
+    background:repeating-linear-gradient(to top, #2b3340 0 6px, transparent 6px 12px); }
+  .t3-conduit.hot::before{ background:repeating-linear-gradient(to top, var(--green) 0 6px, transparent 6px 12px); opacity:.5; }
+  .t3-conduit .spark{ position:absolute; left:50%; bottom:3px; width:11px; height:11px; border-radius:50%;
+    background:var(--green); opacity:0; box-shadow:0 0 12px var(--green); transform:translateX(-50%); z-index:3; }
+  .t3-conduit.hot .spark{ animation:rise .85s linear infinite; }
+  .t3-conduit.hot .spark.s2{ animation-delay:.42s; }
+  @keyframes rise{ 0%{opacity:0; transform:translate(-50%,0);} 15%{opacity:1;} 100%{opacity:0; transform:translate(-50%,-34px);} }
+  .pipe.fed .lane{ border-color:var(--green); box-shadow:0 0 10px rgba(63,185,80,.25); }
+  .formers{ display:flex; gap:14px; justify-content:center; flex-wrap:wrap; }
+  .former{ flex:1; min-width:240px; max-width:320px; background:#0d1117; border:1px solid var(--line);
+    border-radius:10px; padding:10px 12px; transition:border-color .3s, box-shadow .3s; }
+  .former h5{ margin:0 0 6px; font-size:13px; }
+  .former .hitbox{ font-size:12px; color:var(--muted); margin-bottom:8px; }
+  .former .hitbox b{ color:var(--amber); font-size:14px; }
+  .former.ready{ border-color:var(--green); box-shadow:0 0 12px rgba(63,185,80,.2); }
+  .former.ready .hitbox b{ color:var(--green); }
+  .mbrow{ display:flex; gap:6px; }
+  .mbchip{ flex:1; height:24px; border-radius:6px; background:#11161f; border:1px solid var(--line);
+    display:flex; align-items:center; justify-content:center; font-size:11px; color:var(--muted); opacity:.3; transition:all .25s; }
+  .mbchip.on{ opacity:1; background:linear-gradient(180deg,#143055,#102844); border-color:var(--blue); color:#cfe6ff; }
+  .mbchip.fixed{ opacity:1; background:linear-gradient(180deg,#0f3a1d,#0e2c18); border-color:var(--green); color:#c4f7d4; }
+  .former .chk{ font-size:12px; color:var(--green); margin-top:6px; min-height:16px; }
+
+  /* ---------- tab4: thread relationships ---------- */
+  .t4wrap{ display:flex; gap:18px; flex-wrap:wrap; }
+  .t4flow{ flex:2; min-width:340px; display:flex; flex-direction:column; align-items:stretch; gap:0; }
+  .t4why{ flex:1; min-width:280px; display:flex; flex-direction:column; gap:10px; }
+  .tbox{ border:1px solid var(--line); border-radius:10px; padding:10px 12px; background:#0d1117; position:relative; }
+  .tbox .tname{ font-size:14px; font-weight:700; display:flex; align-items:center; gap:8px; }
+  .tbox .tdesc{ font-size:11.5px; color:var(--muted); margin-top:3px; }
+  .tbox .pin{ font-size:10px; padding:1px 7px; border-radius:999px; border:1px solid var(--line); color:var(--muted); }
+  .tbox.thread-hit{ border-left:4px solid var(--purple); }
+  .tbox.thread-io{ border-left:4px solid var(--blue); }
+  .tbox.thread-sync{ border-left:4px solid var(--cyan); }
+  .tbox.sched{ border-left:4px solid var(--muted); background:#11161f; }
+  .tarrow{ text-align:center; color:var(--muted); font-size:11px; padding:5px 0; position:relative; }
+  .tarrow b{ color:var(--text); }
+  .minnode{ align-self:center; margin:6px 0; border:1.5px solid var(--amber); border-radius:999px;
+    padding:7px 16px; font-size:12px; color:#ffe2ab; background:#1d1808; font-weight:600; }
+  .minnode.g2{ border-color:var(--green); color:#c4f7d4; background:#0f2a18; }
+  .minnode small{ display:block; font-size:10px; color:var(--muted); font-weight:400; }
+  .whycard{ border:1px solid var(--line); border-radius:10px; padding:11px 13px; background:var(--panel2); }
+  .whycard h4{ margin:0 0 6px; font-size:13px; }
+  .whycard p{ margin:0; font-size:12px; color:var(--muted); }
+  .whycard.a{ border-left:4px solid var(--purple); }
+  .whycard.b{ border-left:4px solid var(--cyan); }
+  .whycard.c{ border-left:4px solid var(--green); }
+  .whycard code{ font-size:11px; }
+
+  /* ---------- tab4 animation states ---------- */
+  #scene4 .tbox,#scene4 .tarrow,#scene4 .minnode,#scene4 .whycard{ transition:all .3s; }
+  #scene4 .dimmed{ opacity:.3; }
+  .tbox.lit{ box-shadow:0 0 18px rgba(88,166,255,.45); transform:translateX(5px); }
+  .tbox.thread-hit.lit{ box-shadow:0 0 18px rgba(188,140,255,.55); }
+  .tbox.thread-io.lit{ box-shadow:0 0 18px rgba(88,166,255,.55); }
+  .tbox.thread-sync.lit{ box-shadow:0 0 18px rgba(86,212,221,.55); }
+  .tbox.sched.lit{ box-shadow:0 0 18px rgba(63,185,80,.4); }
+  .tarrow.lit{ color:var(--green); font-weight:700; }
+  .tarrow.lit b{ color:var(--green); }
+  .tarrow.lit::after{ content:" ●"; color:var(--green); animation:t4blink .7s infinite; }
+  @keyframes t4blink{ 0%,100%{opacity:.2;} 50%{opacity:1;} }
+  .minnode.lit{ box-shadow:0 0 22px rgba(210,153,34,.65); transform:scale(1.06); }
+  .minnode.g2.lit{ box-shadow:0 0 22px rgba(63,185,80,.65); }
+  .whycard.lit{ box-shadow:0 0 18px rgba(63,185,80,.35); transform:translateY(-3px); border-left-width:6px; }
+
+  /* ---------- tab5: two-request full lifecycle story ---------- */
+  #scene5 .story-top{ display:flex; gap:14px; align-items:stretch; margin-bottom:14px; }
+  .gpu-badge{ width:120px; border:1px solid var(--line); border-radius:12px; background:#0d1117;
+    display:flex; flex-direction:column; align-items:center; justify-content:center; gap:2px;
+    font-size:12px; color:var(--muted); transition:all .3s; }
+  .gpu-badge .ic{ font-size:22px; }
+  .gpu-badge.busy{ border-color:var(--blue); color:#cfe6ff; box-shadow:0 0 18px rgba(88,166,255,.45);
+    background:linear-gradient(180deg,#143055,#102844); animation:gpupulse 1s infinite; }
+  @keyframes gpupulse{ 0%,100%{box-shadow:0 0 12px rgba(88,166,255,.3);} 50%{box-shadow:0 0 22px rgba(88,166,255,.6);} }
+  .l3box{ flex:1; border:1px solid var(--line); border-radius:12px; background:#0d1117; padding:8px 12px; transition:all .3s; }
+  .l3box.hot{ border-color:var(--green); box-shadow:0 0 14px rgba(63,185,80,.25); }
+  .l3title{ font-size:12px; color:var(--muted); margin-bottom:6px; display:flex; justify-content:space-between; align-items:center; }
+  .l3title .badge{ font-size:10px; padding:1px 8px; border-radius:999px; border:1px solid var(--line); color:var(--muted); min-width:46px; text-align:center; }
+  .l3title .badge.miss{ border-color:var(--red); color:#ffd4d0; }
+  .l3title .badge.hit{ border-color:var(--green); color:#c4f7d4; }
+  .pagerow{ display:flex; gap:6px; flex-wrap:wrap; min-height:28px; }
+  .pg{ width:38px; height:26px; border-radius:6px; border:1px solid var(--line); background:#11161f;
+    display:flex; align-items:center; justify-content:center; font-size:11px; color:#5b6470;
+    transition:all .3s; opacity:0; transform:scale(.6); }
+  .pg.show{ opacity:1; transform:none; }
+  .pg.l3{ border-color:var(--green); color:#c4f7d4; background:linear-gradient(180deg,#0f3a1d,#0e2c18); }
+
+  #scene5 .ranks{ display:flex; flex-direction:column; gap:10px; }
+  .ranklane{ border:1px solid var(--line); border-radius:12px; padding:8px 12px; background:var(--panel2); transition:all .3s; }
+  .ranklane.active{ border-color:var(--blue); box-shadow:0 0 12px rgba(88,166,255,.22); }
+  .rankhdr{ display:flex; align-items:center; gap:10px; margin-bottom:6px; font-size:12px; }
+  .rankhdr .rname{ font-weight:700; color:var(--text); }
+  .rankhdr .tps{ display:flex; gap:4px; }
+  .rankhdr .tp{ width:8px; height:8px; border-radius:50%; background:#11161f; border:1px solid var(--line); transition:all .25s; }
+  .rankhdr .tp.on{ background:var(--blue); border-color:var(--blue); box-shadow:0 0 6px var(--blue); }
+  .rankhdr .rstat{ margin-left:auto; font-size:11px; padding:1px 9px; border-radius:999px; border:1px solid var(--line); color:var(--muted); }
+  .rankhdr .rstat.miss{ border-color:var(--red); color:#ffd4d0; }
+  .rankhdr .rstat.hit{ border-color:var(--green); color:#c4f7d4; }
+  .rankhdr .rstat.warn{ border-color:var(--amber); color:#ffe2ab; }
+  .htree{ display:flex; align-items:center; gap:8px; min-height:28px; }
+  .htree .root{ font-size:10px; color:var(--muted); padding:2px 7px; border:1px dashed var(--line); border-radius:6px; }
+  .htnode{ width:40px; height:26px; border-radius:6px; border:1px solid var(--line); background:#11161f;
+    display:flex; align-items:center; justify-content:center; font-size:11px; color:#5b6470; position:relative;
+    transition:all .35s; opacity:0; transform:translateY(-6px) scale(.7); }
+  .htnode::before{ content:""; position:absolute; left:-8px; top:50%; width:8px; height:1px; background:var(--line); }
+  .htnode:first-of-type::before{ display:none; }
+  .htnode.show{ opacity:1; transform:none; }
+  .htnode.committed{ border-color:var(--green); color:#c4f7d4; background:linear-gradient(180deg,#0f3a1d,#0e2c18); }
+  .htnode.inserting{ border-color:var(--blue); color:#cfe6ff; background:linear-gradient(180deg,#143055,#102844); }
+  .htnode.matched{ border-color:var(--cyan); color:#cdf6fa; box-shadow:0 0 8px rgba(86,212,221,.4); }
+  .htnode.warn{ border-color:var(--amber); color:#ffe2ab; background:#1d1808; }
+  .htnode.evict{ border-color:var(--red); color:#ffd4d0; background:linear-gradient(180deg,#3a1414,#2a1010); }
+  .story-sync{ display:flex; gap:14px; justify-content:center; margin:14px 0 6px; flex-wrap:wrap; }
+  .syncbadge{ font-size:12px; padding:6px 14px; border-radius:999px; border:1.5px solid var(--line); color:var(--muted); transition:all .3s; }
+  .syncbadge.g1.fire{ border-color:var(--amber); color:#ffe2ab; background:#1d1808; box-shadow:0 0 16px rgba(210,153,34,.45); transform:scale(1.04); }
+  .syncbadge.g2.fire{ border-color:var(--green); color:#c4f7d4; background:#0f2a18; box-shadow:0 0 16px rgba(63,185,80,.45); transform:scale(1.04); }
+  .consist-flag{ text-align:center; font-weight:700; font-size:14px; min-height:20px; margin-top:4px; }
+  .consist-flag.ok{ color:var(--green); } .consist-flag.bad{ color:var(--red); }
+
+  /* ---------- tab6: PrefetchAck alignment & anti-hang ---------- */
+  #scene6 .ackmesh{ display:flex; flex-direction:column; gap:8px; margin-bottom:6px; }
+  .ackrow{ display:flex; align-items:center; gap:10px; border:1px solid var(--line); border-radius:10px; padding:6px 10px; background:var(--panel2); transition:all .3s; }
+  .ackrow.blocked{ border-color:var(--red); box-shadow:0 0 12px rgba(248,81,73,.3); }
+  .acklabel{ width:190px; font-size:12px; color:var(--muted); }
+  .acklabel b{ color:var(--text); }
+  .ackslots{ flex:1; display:flex; gap:8px; }
+  .ackchip{ flex:1; height:34px; border-radius:8px; border:1px solid var(--line); background:#0d1117;
+    display:flex; align-items:center; justify-content:center; font-size:11px; color:#5b6470; gap:5px;
+    transition:all .3s; position:relative; }
+  .ackchip .err{ font-size:9px; color:var(--red); }
+  .ackchip.pending{ opacity:.4; }
+  .ackchip.emit{ border-color:var(--blue); color:#cfe6ff; background:linear-gradient(180deg,#143055,#102844); box-shadow:0 0 10px rgba(88,166,255,.4); }
+  .ackchip.passed{ border-color:var(--green); color:#c4f7d4; background:linear-gradient(180deg,#0f3a1d,#0e2c18); }
+  .ackchip.wait{ border-color:var(--amber); color:#ffe2ab; background:#1d1808; animation:pulse 1s infinite; }
+  .ackchip.missing{ border-color:var(--red); border-style:dashed; color:#ffd4d0; background:#2a1010; }
+  .barriers{ border:1px dashed var(--line); border-radius:10px; padding:8px 10px; margin-top:4px; }
+  .barlabel{ font-size:12px; color:var(--muted); margin-bottom:6px; text-align:center; }
+  .barrow{ display:flex; gap:8px; align-items:stretch; }
+  .barrow .barspacer{ width:190px; }
+  .barcols{ flex:1; display:flex; gap:8px; }
+  .bar{ flex:1; border:1.5px solid var(--line); border-radius:8px; padding:6px 4px; text-align:center;
+    font-size:11px; color:var(--muted); transition:all .3s; }
+  .bar .bcount{ display:block; font-size:13px; font-weight:800; margin-top:2px; color:#5b6470; }
+  .bar.waiting{ border-color:var(--amber); color:#ffe2ab; background:#1d1808; animation:pulse 1s infinite; }
+  .bar.waiting .bcount{ color:#ffe2ab; }
+  .bar.fired{ border-color:var(--green); color:#c4f7d4; background:#0f2a18; }
+  .bar.fired .bcount{ color:#c4f7d4; }
+  .bar.dead{ border-color:var(--red); color:#ffd4d0; background:#2a1010; box-shadow:0 0 12px rgba(248,81,73,.4); }
+  .bar.dead .bcount{ color:#ffd4d0; }
+</style>
+</head>
+<body>
+<button id="langBtn" onclick="toggleLang()">EN</button>
+<header>
+  <h1>HiCache × Pipeline Parallel：树一致性 & 防死锁</h1>
+  <p>拓扑 <b>PP=3 × TP=8 = 24 ranks</b> · 行=TP 组、列=PP 组 · MIN all-reduce 保证 radix tree 一致 · 2 套 gloo 组避免后台 collective 死锁</p>
+</header>
+
+<div class="tabs">
+  <button class="tab active" data-tab="story">① 两请求全流程（L3 命中/未命中 · host tree 一致）</button>
+  <button class="tab" data-tab="consistency">② 树一致性（自动播放）</button>
+  <button class="tab" data-tab="deadlock">③ 为什么 2 个组不死锁</button>
+  <button class="tab" data-tab="skew">④ 异步时间差 × MIN 统一步调</button>
+  <button class="tab" data-tab="threads">⑤ 线程关系 & 树一致性</button>
+  <button class="tab" data-tab="ackalign">⑥ PrefetchAck 对齐 & 防 hang</button>
+</div>
+
+<div class="wrap">
+  <!-- ============ TAB 5 (story, shown first) ============ -->
+  <div class="scene" id="scene5">
+    <div class="legend">
+      <span class="chip"><span class="sw" style="background:var(--blue)"></span>GPU 计算 / 插入中</span>
+      <span class="chip"><span class="sw" style="background:var(--cyan)"></span>match 命中前缀</span>
+      <span class="chip"><span class="sw" style="background:var(--amber)"></span>各 rank 不一致（待 MIN 统一）</span>
+      <span class="chip"><span class="sw" style="background:var(--green)"></span>已提交 / 一致</span>
+      <span class="chip"><span class="sw" style="background:var(--red)"></span>未命中 / 淘汰删除</span>
+    </div>
+    <div class="story-top">
+      <div class="gpu-badge" id="gpuBadge"><span class="ic">▣</span><span>GPU 计算</span></div>
+      <div class="l3box" id="l3box">
+        <div class="l3title"><span class="l3lab"><b>L3 持久化存储</b>（storage backend，3 个 rank 共享视图）</span><span class="badge" id="l3badge"></span></div>
+        <div class="pagerow" id="l3pages"></div>
+      </div>
+    </div>
+    <div class="ranks" id="ranks"></div>
+    <div class="story-sync">
+      <div class="syncbadge g1" id="s5sync1">◆ MIN 组1 · prefetch_hits_sync_groups · storage_hit_count</div>
+      <div class="syncbadge g2" id="s5sync2">◆ MIN 组2 · prefetch_completion_sync_groups · completed_tokens</div>
+    </div>
+    <div class="consist-flag" id="s5flag"></div>
+    <div class="caption" id="cap5">自动播放中…</div>
+  </div>
+  <div class="controls" id="ctl5">
+    <button class="ctl primary" id="play5">⏸ 暂停</button>
+    <button class="ctl" id="replay5">⟲ 重播</button>
+  </div>
+
+  <!-- ============ TAB 1 ============ -->
+  <div class="scene hidden" id="scene1">
+    <div class="legend">
+      <span class="chip"><span class="sw" style="background:var(--amber)"></span>命中数被截断（不一致）</span>
+      <span class="chip"><span class="sw" style="background:var(--blue)"></span>TP 组内 MIN 后</span>
+      <span class="chip"><span class="sw" style="background:var(--green)"></span>PP 组内 MIN 后（全局一致）</span>
+    </div>
+    <div id="mesh1"></div>
+    <div class="tree-box">
+      <div class="tree-title" id="treeTitle">所有 24 个 rank 共享同一棵 radix tree</div>
+      <div class="tree" id="sharedTree"></div>
+    </div>
+    <div class="caption" id="cap1">自动播放中…</div>
+  </div>
+  <div class="controls">
+    <button class="ctl primary" id="play1">⏸ 暂停</button>
+    <button class="ctl" id="replay1">⟲ 重播</button>
+  </div>
+
+  <!-- ============ TAB 2 ============ -->
+  <div class="scene hidden" id="scene2">
+    <div class="legend">
+      <span class="chip"><span class="sw" style="background:var(--purple)"></span><b>prefetch_thread</b>（独立后台线程）· reduce(storage_hit_count)</span>
+      <span class="chip"><span class="sw" style="background:var(--cyan)"></span><b>prefetch_sync_thread</b>（独立后台线程）· reduce(completed_tokens)</span>
+    </div>
+    <div class="note" style="margin-bottom:8px;">每个 cell = 1 个 rank，内含 2 个独立后台线程（小圆点 ●A ●B）。每一行是一个 <b>TP communicator</b>，每一列是一个 <b>PP communicator</b>。</div>
+    <div id="mesh2"></div>
+    <div class="groups">
+      <div class="grp g1" id="g1"><b>prefetch_hits_sync_groups</b><br>命中页数归约组（含 TP 环 + PP 环）<br><span style="font-size:11px">reduce(storage_hit_count)</span></div>
+      <div class="grp g2" id="g2"><b>prefetch_completion_sync_groups</b><br>完成 token 归约组（含 TP 环 + PP 环）<br><span style="font-size:11px">reduce(completed_tokens)</span></div>
+    </div>
+    <div class="banner" id="banner2"></div>
+    <div class="caption" id="cap2">选择场景：用 <b>1 套组</b> 会死锁，用 <b>2 套组</b> 则安全。</div>
+  </div>
+  <div class="controls" id="ctl2">
+    <button class="ctl alt" id="play1grp">▶ 1 套组（死锁）</button>
+    <button class="ctl primary" id="play2grp">▶ 2 套组（安全）</button>
+    <button class="ctl" id="reset2">重置</button>
+  </div>
+
+  <!-- ============ TAB 3 ============ -->
+  <div class="scene hidden" id="scene3">
+    <!-- layer 3: pipeline timing (top) -->
+    <div class="t3-section">
+      <div class="t3-title">③ 主 PP 流水线执行<strong>时序</strong> <span class="tag gpu">NCCL · GPU</span>
+        <span style="color:var(--muted);font-size:11px;">时序连续、错峰流动，<strong style="color:var(--green)">不被后台 prefetch 同步打断</strong></span>
+      </div>
+      <div class="pipe" id="pipe">
+        <div class="lane s0"><span class="lname">PP stage 0</span><span class="lane-hint" id="hint0"></span></div>
+        <div class="lane s1"><span class="lname">PP stage 1</span><span class="lane-hint" id="hint1"></span></div>
+        <div class="lane s2"><span class="lname">PP stage 2</span><span class="lane-hint" id="hint2"></span></div>
+      </div>
+      <div class="flow-note">↑ 流水线跑的正是②组好的 <strong>mb0→mb3</strong>，沿 stage0→1→2 错峰对角推进</div>
+    </div>
+
+    <div class="t3-conduit" id="arrow1">
+      <span class="clabel">▲ 组好的 <b>batch &amp; micro-batch 顺序</b> 喂给流水线（内容）</span>
+      <span class="spark"></span><span class="spark s2"></span>
+    </div>
+
+    <!-- layer 2: batch former (middle) -->
+    <div class="t3-section">
+      <div class="t3-title">② 三个 PP rank 用<strong>同一个 storage hit</strong> 组 batch（内容必须逐 rank 一致）</div>
+      <div class="formers" id="formers"></div>
+    </div>
+
+    <div class="t3-conduit" id="arrow2">
+      <span class="clabel">▲ <code>all_reduce(MIN)</code> 输出统一值 <b>6</b> → 决定 batch size</span>
+      <span class="spark"></span><span class="spark s2"></span>
+    </div>
+
+    <!-- layer 1: async prefetch + MIN (bottom = causal source) -->
+    <div class="t3-section">
+      <div class="t3-title">① 异步 prefetch 查询 → <code>all_reduce(MIN)</code> <span class="tag cpu">gloo · CPU 后台线程</span></div>
+      <div class="sync-area" id="syncArea">
+        <div class="clock" id="t3clock">t = 0.0s</div>
+        <div class="barrier" id="barrier" style="left:62%"><span class="blabel">all_reduce(MIN)</span></div>
+        <div class="slabel" id="sl0">rank pp0</div><div class="pkt travel" id="pkt0"><span>pp0 storage hit</span><span class="hv">8</span></div>
+        <div class="slabel" id="sl1">rank pp1</div><div class="pkt travel" id="pkt1"><span>pp1 storage hit</span><span class="hv">6</span></div>
+        <div class="slabel" id="sl2">rank pp2</div><div class="pkt travel" id="pkt2"><span>pp2 storage hit</span><span class="hv">7</span></div>
+      </div>
+    </div>
+    <div class="caption" id="cap3">自动播放中…</div>
+  </div>
+  <div class="controls" id="ctl3">
+    <button class="ctl primary" id="play3">⏸ 暂停</button>
+    <button class="ctl" id="replay3">⟲ 重播</button>
+  </div>
+
+  <!-- ============ TAB 4 ============ -->
+  <div class="scene hidden" id="scene4">
+    <div class="t4wrap">
+      <!-- left: thread data-flow -->
+      <div class="t4flow">
+        <div class="tbox sched" id="t4b0">
+          <div class="tname">调度器 Scheduler <span class="pin">主线程</span></div>
+          <div class="tdesc">发起 prefetch 请求（writeback / load）</div>
+        </div>
+        <div class="tarrow" id="t4a0">▼ <b>prefetch_queue</b>（PrefetchOperation）</div>
+
+        <div class="tbox thread-hit" id="t4b1">
+          <div class="tname">① prefetch_thread <span class="pin">storage-hit 线程</span></div>
+          <div class="tdesc"><code>_storage_hit_query()</code> 查询 L3 命中页数；命中足够→放 prefetch_buffer，不足→prefetch_revoke_queue</div>
+        </div>
+        <div class="minnode" id="t4m1">◆ all_reduce(MIN) storage_hit_count
+          <small>@ prefetch_hits_sync_groups（组1，gloo/CPU，含 TP 环 + PP 环）</small></div>
+        <div class="tarrow" id="t4a1">▼ <b>prefetch_buffer</b></div>
+
+        <div class="tbox thread-io" id="t4b2">
+          <div class="tname">② prefetch_io_aux_thread <span class="pin">IO 加载线程</span></div>
+          <div class="tdesc"><code>_page_transfer()</code> 逐 batch 把页从 L3 读入 host；累加 <b>completed_tokens</b>；<b>每个 storage batch 产生 1 个 PrefetchAck</b>（出错也照常产生）</div>
+        </div>
+        <div class="tarrow" id="t4a2">▼ <b>prefetch_sync_queue</b>（PrefetchAck）</div>
+
+        <div class="tbox thread-sync" id="t4b3">
+          <div class="tname">③ prefetch_sync_thread <span class="pin">completion-token 线程</span></div>
+          <div class="tdesc">对每个 ack 的 <b>completed_tokens</b> 做归约</div>
+        </div>
+        <div class="minnode g2" id="t4m2">◆ all_reduce(MIN) completed_tokens
+          <small>@ prefetch_completion_sync_groups（组2，gloo/CPU，含 TP 环 + PP 环）</small></div>
+        <div class="tarrow" id="t4a3">▼ <b>ack_prefetch_queue</b></div>
+
+        <div class="tbox sched" id="t4b4">
+          <div class="tname">调度器写入 host radix tree</div>
+          <div class="tdesc">只插入 <b>completed_tokens</b> 长度的前缀 → <code>_insert_helper_host()</code></div>
+        </div>
+      </div>
+
+      <!-- right: why consistent -->
+      <div class="t4why">
+        <div class="whycard a" id="t4wa">
+          <h4>为什么 MIN(storage_hit) 一致？</h4>
+          <p>各 rank 命中可能不同（host 内存截断、L3 视图差异）。MIN 取<b>最长公共可命中前缀</b> → 所有 rank <b>抓取范围一致</b>，不会各抓不同长度。</p>
+        </div>
+        <div class="whycard b" id="t4wb">
+          <h4>为什么 MIN(completed_tokens) 一致？</h4>
+          <p>即便抓取范围一致，实际逐页加载仍可能<b>部分失败</b>（<code>page_get</code> 返回 n≠batch）。MIN 只提交<b>所有 rank 都成功落盘的最长公共前缀</b> → 写入 host tree 的长度逐 rank 相同。</p>
+        </div>
+        <div class="whycard c" id="t4wc">
+          <h4>为什么不会 hang？</h4>
+          <p>每个 storage batch <b>都产生且仅产生一个 PrefetchAck</b>（即使出错也照常产生）→ 每个 rank 参与的 reduce <b>次数严格相等</b>，collective 一一对齐。两个 MIN 一起保证：<b>插入 host tree 的前缀逐 rank 完全相同 → 树一致</b>。</p>
+        </div>
+      </div>
+    </div>
+    <div class="caption" id="cap4">两个 MIN 同步点（组1 命中数、组2 完成数）+ 每 batch 恒定 1 个 ack，共同保证 PP 各 rank 的 host radix tree 严格一致。</div>
+  </div>
+  <div class="controls" id="ctl4">
+    <button class="ctl primary" id="play4">⏸ 暂停</button>
+    <button class="ctl" id="replay4">⟲ 重播</button>
+  </div>
+
+  <!-- ============ TAB 6 : PrefetchAck alignment & anti-hang ============ -->
+  <div class="scene hidden" id="scene6">
+    <div class="note" style="margin-bottom:10px;">每个 <b>storage batch</b> 在 <code>_page_transfer</code> 里恒产 <b>1 个 PrefetchAck</b>；<code>prefetch_sync_thread</code> 对<strong>每个 ack</strong> 在组2 做一次 <code>all_reduce(MIN)</code>。所以 <b>ack 数 = batch 数 = 组2 collective 次数</b>，必须逐 rank 相等。</div>
+    <div class="legend">
+      <span class="chip"><span class="sw" style="background:var(--blue)"></span>ack 已产出（参与本轮 reduce）</span>
+      <span class="chip"><span class="sw" style="background:var(--green)"></span>barrier 凑齐 3/3 → 通过</span>
+      <span class="chip"><span class="sw" style="background:var(--amber)"></span>已到达，等待缺席方</span>
+      <span class="chip"><span class="sw" style="background:var(--red)"></span>缺失 ack → 永远等不到</span>
+    </div>
+    <div class="ackmesh" id="ackmesh"></div>
+    <div class="barriers">
+      <div class="barlabel">◆ <code>all_reduce(MIN)</code> @ 组2（prefetch_completion_sync_groups）· 每个 ack 一次 barrier</div>
+      <div class="barrow"><div class="barspacer"></div><div class="barcols" id="barcols"></div></div>
+    </div>
+    <div class="banner" id="banner6"></div>
+    <div class="caption" id="cap6">选择场景：<b>每 batch 恒 1 ack</b> → 次数对齐、安全；<b>出错就 break</b> → ack 缺一个 → 组2 reduce 错位 → hang。</div>
+  </div>
+  <div class="controls" id="ctl6">
+    <button class="ctl primary" id="play6good">▶ 正确（每 batch 恒 1 ack）</button>
+    <button class="ctl alt" id="play6bad">▶ 错误（出错 break → ack 缺失）</button>
+    <button class="ctl" id="reset6">重置</button>
+  </div>
+</div>
+
+<script>
+const PP=3, TP=8;
+// initial hit counts per (pp,tp). some truncated by host-mem pressure.
+const HITS=[
+  [8,8,8,8,8,8,8,8],
+  [8,8,6,8,8,8,8,7],   // stage1: rank(1,2)=6, rank(1,7)=7 truncated
+  [8,8,8,8,8,8,8,8],
+];
+const ROWMIN = HITS.map(r=>Math.min(...r));      // [8,6,8]
+const GMIN = Math.min(...ROWMIN);                // 6
+const sleep=ms=>new Promise(r=>setTimeout(r,ms));
+
+/* ---------- build a mesh ---------- */
+function buildMesh(containerId, withThreads){
+  const c=document.getElementById(containerId);
+  let html='<div class="mesh-head"><div class="tp-hdr">';
+  for(let t=0;t<TP;t++) html+=`<div class="th">TP ${t}</div>`;
+  html+='</div></div>';
+  for(let p=0;p<PP;p++){
+    html+=`<div class="pp-row"><div class="pp-label"><b>PP stage ${p}</b><br>(TP 组)</div><div class="row-cells" id="${containerId}-row${p}">`;
+    for(let t=0;t<TP;t++){
+      html+=`<div class="cell" id="${containerId}-c${p}-${t}">
+        <div class="v">—</div>
+        <div class="rk">r${p*TP+t}</div>
+        ${withThreads?`<div class="tdots"><span class="td a" id="${containerId}-A-${p}-${t}"></span><span class="td b" id="${containerId}-B-${p}-${t}"></span></div>`:''}
+      </div>`;
+    }
+    html+='</div></div>';
+  }
+  // pp-group footer (columns)
+  html+='<div class="pp-foot"><div class="lab">PP 组(列)<br>每列跨 3 个 stage →</div><div class="cols">';
+  for(let t=0;t<TP;t++) html+=`<div class="col" id="${containerId}-col${t}">r${t}·r${t+TP}·r${t+2*TP}</div>`;
+  html+='</div></div>';
+  c.innerHTML=html;
+}
+const cell=(m,p,t)=>document.getElementById(`${m}-c${p}-${t}`);
+const val =(m,p,t)=>cell(m,p,t).querySelector('.v');
+
+buildMesh('mesh1',false);
+buildMesh('mesh2',true);
+
+/* ============================================================
+   TAB 1 : auto-play loop
+   query -> diverge -> TP all_reduce(MIN) -> PP all_reduce(MIN) -> consistent tree
+   ============================================================ */
+let t1Token=0, t1Paused=false;
+const cap1=document.getElementById('cap1');
+const sharedTree=document.getElementById('sharedTree');
+
+function resetMesh1(){
+  for(let p=0;p<PP;p++) for(let t=0;t<TP;t++){
+    const cl=cell('mesh1',p,t); cl.className='cell';
+    val('mesh1',p,t).innerHTML='—';
+  }
+  sharedTree.innerHTML='';
+}
+async function gate(my){ while(t1Paused){ await sleep(120); if(my!==t1Token) throw 0; } }
+async function step(ms,my){ await sleep(ms); await gate(my); if(my!==t1Token) throw 0; }
+
+async function runTab1(){
+  const my=++t1Token;
+  try{
+    while(true){
+      resetMesh1();
+      cap1.innerHTML='拓扑 <b>PP=3 × TP=8 = 24 个 rank</b>：每个 PP stage 下挂 8 个 TP rank。';
+      await step(1600,my);
+
+      // 1) independent query
+      for(let p=0;p<PP;p++) for(let t=0;t<TP;t++){
+        const v=HITS[p][t]; const cl=cell('mesh1',p,t);
+        val('mesh1',p,t).innerHTML=v+' <small>pg</small>';
+        if(v!==8) cl.classList.add('varied');
+        await sleep(35);
+      }
+      cap1.innerHTML='① 各 rank <span class="k">独立</span>向 L3 查询前缀命中。<b style="color:var(--amber)">注意 r10、r15 因 host 内存压力被截断</b>（6 / 7 页）。';
+      await step(2200,my);
+
+      // 2) diverge warning
+      cap1.innerHTML='② 若各 rank 按自己的命中数建 radix tree → 树高不一致 → 后续 PP 集合通信 <b style="color:var(--red)">shape mismatch → crash</b>。';
+      for(let p=0;p<PP;p++) for(let t=0;t<TP;t++) if(HITS[p][t]!==8){ cell('mesh1',p,t).classList.add('bad'); }
+      await step(2200,my);
+      for(let p=0;p<PP;p++) for(let t=0;t<TP;t++) cell('mesh1',p,t).classList.remove('bad','varied');
+
+      // 3) TP all_reduce(MIN) — sweep each row (all rows in parallel)
+      cap1.innerHTML='③ 第一步：在 <span class="k">TP 组（每一行 8 个 rank）</span>内 <code>all_reduce(MIN)</code>。';
+      for(let t=0;t<TP;t++){
+        for(let p=0;p<PP;p++) cell('mesh1',p,t).classList.add('sweep');
+        await step(110,my);
+        for(let p=0;p<PP;p++) cell('mesh1',p,t).classList.remove('sweep');
+      }
+      for(let p=0;p<PP;p++) for(let t=0;t<TP;t++){
+        const cl=cell('mesh1',p,t); cl.classList.add('tpmin');
+        val('mesh1',p,t).innerHTML=ROWMIN[p]+' <small>pg</small>';
+      }
+      cap1.innerHTML='③ TP 组归约后：<b>每一行变得一致</b>（PP0=8, PP1=6, PP2=8 = 各行最小值）。';
+      await step(1900,my);
+
+      // 4) PP all_reduce(MIN) — sweep each column (top->bottom)
+      cap1.innerHTML='④ 第二步：在 <span class="k">PP 组（每一列 3 个 rank）</span>内 <code>all_reduce(MIN)</code> → 收敛到全局最小值。';
+      for(let p=0;p<PP;p++){
+        for(let t=0;t<TP;t++) cell('mesh1',p,t).classList.add('sweep');
+        await step(180,my);
+        for(let t=0;t<TP;t++) cell('mesh1',p,t).classList.remove('sweep');
+      }
+      for(let p=0;p<PP;p++) for(let t=0;t<TP;t++){
+        const cl=cell('mesh1',p,t); cl.classList.remove('tpmin'); cl.classList.add('gmin');
+        val('mesh1',p,t).innerHTML=GMIN+' <small>pg</small>';
+      }
+      cap1.innerHTML='④ PP 组归约后：<b style="color:var(--green)">全部 24 个 rank 命中数 = 6</b>（最长公共前缀）。';
+      await step(1700,my);
+
+      // 5) shared consistent tree
+      cap1.innerHTML='⑤ 所有 rank 都只 prefetch / 建树到 6 → <span style="color:var(--green)">24 个 rank 的 radix tree 完全一致 ✓</span>';
+      sharedTree.innerHTML='';
+      for(let i=0;i<GMIN;i++){
+        const n=document.createElement('div'); n.className='tnode'; n.textContent='page '+i; sharedTree.appendChild(n);
+        await step(120,my); n.classList.add('show');
+      }
+      await step(2600,my);
+    }
+  }catch(e){ /* cancelled */ }
+}
+
+document.getElementById('play1').onclick=function(){
+  t1Paused=!t1Paused;
+  this.textContent=t1Paused?'▶ 播放':'⏸ 暂停';
+};
+document.getElementById('replay1').onclick=()=>{ t1Paused=false; document.getElementById('play1').textContent='⏸ 暂停'; runTab1(); };
+
+/* ============================================================
+   TAB 2 : why 2 group-sets avoid deadlock (PP×TP mesh)
+   ============================================================ */
+let t2Token=0;
+const cap2=document.getElementById('cap2');
+const banner2=document.getElementById('banner2');
+const g1=document.getElementById('g1'), g2=document.getElementById('g2');
+const row2=p=>document.getElementById('mesh2-row'+p);
+const col2=t=>document.getElementById('mesh2-col'+t);
+const dotEl=(op,p,t)=>document.getElementById(`mesh2-${op}-${p}-${t}`);
+
+function resetMesh2(){
+  ++t2Token;
+  for(let p=0;p<PP;p++){
+    row2(p).className='row-cells';
+    for(let t=0;t<TP;t++){
+      const cl=cell('mesh2',p,t); cl.className='cell';
+      val('mesh2',p,t).innerHTML=GMIN+' <small>pg</small>';
+      dotEl('A',p,t).className='td a'; dotEl('B',p,t).className='td b';
+    }
+  }
+  for(let t=0;t<TP;t++) col2(t).className='col';
+  g1.classList.remove('hot'); g2.classList.remove('hot'); g2.style.opacity=1;
+  banner2.className='banner'; banner2.textContent='';
+}
+async function s2(ms,my){ await sleep(ms); if(my!==t2Token) throw 0; }
+
+/* ---- 1 shared group set -> deadlock ---- */
+async function play1Group(){
+  resetMesh2(); const my=t2Token;
+  try{
+    g2.style.opacity=.25; g1.classList.add('hot');
+    cap2.innerHTML='只有 <b>1 套组</b>：prefetch_thread(A) 与 prefetch_sync_thread(B) 共用同一个 communicator 集。';
+    await s2(800,my);
+
+    // each rank's two threads race: some submit A first (purple), some B first (cyan)
+    cap2.innerHTML='两个后台线程<b>独立调度、顺序不定</b>：同一个 TP 环里，有的 rank 先发 A，有的先发 B。';
+    for(let p=0;p<PP;p++){
+      for(let t=0;t<TP;t++){
+        const aFirst=((p+t)%2===0);      // deterministic but mixed within each row
+        const first=aFirst?'A':'B';
+        dotEl(first,p,t).classList.add('on');
+      }
+    }
+    await s2(1200,my);
+
+    // rings cannot align -> red
+    cap2.innerHTML='同一个 communicator 上各 rank 提交的 collective <b style="color:var(--red)">不是同一个</b>（A 与 B 错位）→ rendezvous 永远配不上。';
+    for(let p=0;p<PP;p++){
+      row2(p).classList.add('ring-bad');
+      for(let t=0;t<TP;t++){
+        cell('mesh2',p,t).classList.add('bad');
+        const aFirst=((p+t)%2===0);
+        dotEl(aFirst?'A':'B',p,t).className=`td ${aFirst?'a':'b'} dead`;
+      }
+    }
+    for(let t=0;t<TP;t++) col2(t).classList.add('ring-bad');
+    await s2(700,my);
+    banner2.className='banner bad'; banner2.textContent='💥 DEADLOCK — 整个 24-rank job 卡死';
+    cap2.innerHTML='只要任一 communicator 上 A/B 交错，该环就死锁 → 全局 PP/TP 通信连环卡住。';
+  }catch(e){}
+}
+
+/* ---- 2 group sets -> safe ---- */
+async function play2Groups(){
+  resetMesh2(); const my=t2Token;
+  try{
+    g1.classList.add('hot'); g2.classList.add('hot');
+    cap2.innerHTML='用 <b>2 套独立组</b>：<b style="color:var(--purple)">A 永远走 prefetch_hits_sync_groups</b>，<b style="color:var(--cyan)">B 永远走 prefetch_completion_sync_groups</b>。';
+    await s2(800,my);
+
+    // wave A: every rank's prefetch_thread submits A to group-set-1; TP rings + PP rings light purple
+    cap2.innerHTML='第一波：所有 rank 的 <b>prefetch_thread</b> 只在 <code>prefetch_hits_sync_groups</code> 上提交 A → 序列一致。';
+    for(let p=0;p<PP;p++){
+      row2(p).classList.add('ring-a');
+      for(let t=0;t<TP;t++) dotEl('A',p,t).classList.add('on');
+    }
+    for(let t=0;t<TP;t++) col2(t).classList.add('ring-a');
+    await s2(1100,my);
+    for(let p=0;p<PP;p++){
+      row2(p).className='row-cells';
+      for(let t=0;t<TP;t++) dotEl('A',p,t).className='td a done';
+    }
+    for(let t=0;t<TP;t++) col2(t).className='col';
+    cap2.innerHTML='✓ TP 环 + PP 环上 A 全部到齐 → 第一波归约完成。';
+    await s2(900,my);
+
+    // wave B: prefetch_sync_thread submits B to group-set-2; rings light cyan
+    cap2.innerHTML='第二波：所有 rank 的 <b>prefetch_sync_thread</b> 只在 <code>prefetch_completion_sync_groups</code> 上提交 B → 序列一致。';
+    for(let p=0;p<PP;p++){
+      row2(p).classList.add('ring-b');
+      for(let t=0;t<TP;t++) dotEl('B',p,t).classList.add('on');
+    }
+    for(let t=0;t<TP;t++) col2(t).classList.add('ring-b');
+    await s2(1100,my);
+    for(let p=0;p<PP;p++){
+      row2(p).className='row-cells';
+      for(let t=0;t<TP;t++){ dotEl('B',p,t).className='td b done'; cell('mesh2',p,t).classList.add('gmin'); }
+    }
+    for(let t=0;t<TP;t++) col2(t).className='col';
+    banner2.className='banner ok'; banner2.textContent='✅ 安全 — 24 个 rank 全部对齐完成';
+    cap2.innerHTML='每个 communicator 上的 collective 序列在所有 rank <b style="color:var(--green)">完全一致</b>（A→组1、B→组2 不交叉）→ 不会死锁。';
+  }catch(e){}
+}
+
+document.getElementById('play1grp').onclick=play1Group;
+document.getElementById('play2grp').onclick=play2Groups;
+document.getElementById('reset2').onclick=()=>{ resetMesh2(); cap2.innerHTML='选择场景：用 <b>1 套组</b> 会死锁，用 <b>2 套组</b> 则安全。'; };
+
+/* ============================================================
+   TAB 3 : async PP skew  x  all_reduce(MIN) unifies pace
+   Top: continuous, skewed micro-batch pipeline (CSS infinite) — never pauses.
+   Bottom: time-driven (rAF). Each rank's prefetch op arrives at the MIN barrier
+   at a DIFFERENT wall-clock time (async skew). Early arrivals park & wait on the
+   gloo CPU group (background thread). When the slowest arrives, one MIN flash
+   unifies all three to 6 and they depart together — while the top pipeline keeps
+   flowing untouched.
+   ============================================================ */
+const NMB=4;                          // micro-batches per batch (illustrative)
+const MB_LABELS=Array.from({length:NMB},(_,k)=>'mb'+k);
+// build top pipeline micro-batches: one controllable block per (stage, mb)
+(function buildPipe(){
+  for(let s=0;s<3;s++){
+    const lane=document.querySelector('#pipe .s'+s);
+    for(let k=0;k<NMB;k++){
+      const mb=document.createElement('div');
+      mb.className='mb'; mb.id=`pmb-${s}-${k}`; mb.textContent=MB_LABELS[k];
+      lane.appendChild(mb);
+    }
+  }
+})();
+// build batch formers (one per PP rank): hit input + ordered mb chips
+(function buildFormers(){
+  const host=document.getElementById('formers');
+  for(let p=0;p<3;p++){
+    const f=document.createElement('div'); f.className='former'; f.id='former'+p;
+    let chips='';
+    for(let k=0;k<NMB;k++) chips+=`<div class="mbchip" id="fchip-${p}-${k}">${MB_LABELS[k]}</div>`;
+    f.innerHTML=`<h5>调度器 · PP rank ${p}</h5>
+      <div class="hitbox"><span class="ht1">已缓存前缀 storage hit = </span><b id="fhit${p}">？</b><span class="ht2"> 页 → 决定 batch 组成</span></div>
+      <div class="mbrow">${chips}</div>
+      <div class="chk" id="fchk${p}"></div>`;
+    host.appendChild(f);
+  }
+})();
+const arrow1=document.getElementById('arrow1');
+const arrow2=document.getElementById('arrow2');
+
+const PKT=[
+  { el:document.getElementById('pkt0'), lab:document.getElementById('sl0'), y:34,  hit:8, arrive:2.6 },
+  { el:document.getElementById('pkt1'), lab:document.getElementById('sl1'), y:85,  hit:6, arrive:1.9 }, // arrives first
+  { el:document.getElementById('pkt2'), lab:document.getElementById('sl2'), y:136, hit:7, arrive:3.9 }, // slowest -> everyone waits
+];
+const GMIN3=Math.min(...PKT.map(p=>p.hit)); // 6
+const T3={ START:0.4, SYNC:3.9, FLASH_END:4.5, DEPART_END:5.4,
+           DROP:4.5, MB_START:5.0, MB_STEP:0.35, READY:6.4, CYCLE:14.0,
+           PIPE_START:6.4, MB_LAG:1.0, STAGE_LAG:0.9, MB_TRAVEL:3.2,
+           X0:11, XB:62, X1:97 }; // seconds / percentages
+const cap3=document.getElementById('cap3');
+const barrier=document.getElementById('barrier');
+const t3clock=document.getElementById('t3clock');
+let t3raf=null, t3on=false, t3paused=false, t3start=0, t3lastT=0;
+
+function placePkt(p){ p.lab.style.top=p.y+'px'; p.el.style.top=p.y+'px'; }
+PKT.forEach(placePkt);
+function lerp(a,b,u){ return a+(b-a)*Math.max(0,Math.min(1,u)); }
+// returns progress 0..1 if t (or its wrapped form) is inside [st,en), else -1
+function pipeActive(st,en,t){
+  if(t>=st && t<en) return (t-st)/(en-st);
+  const tw=t+T3.CYCLE;
+  if(tw>=st && tw<en) return (tw-st)/(en-st);   // previous batch still draining across loop
+  return -1;
+}
+
+function resetFormers(){
+  arrow1.classList.remove('hot'); arrow2.classList.remove('hot');
+  for(let p=0;p<3;p++){
+    document.getElementById('fhit'+p).textContent='？';
+    document.getElementById('former'+p).classList.remove('ready');
+    document.getElementById('fchk'+p).textContent='';
+    for(let k=0;k<NMB;k++) document.getElementById(`fchip-${p}-${k}`).className='mbchip';
+  }
+}
+
+function t3frame(now){
+  if(!t3on){ return; }
+  if(!t3paused){
+    let t=((now - t3start)/1000) % T3.CYCLE;
+    t3lastT=t;
+    t3clock.textContent='t = '+t.toFixed(1)+'s';
+    barrier.classList.toggle('fire', (t>=T3.SYNC && t<T3.FLASH_END));
+
+    // --- layer 1: async packets toward MIN barrier ---
+    PKT.forEach(p=>{
+      let xPct, cls;
+      if(t < T3.START){ xPct=T3.X0; cls='travel'; }
+      else if(t < p.arrive){ xPct=lerp(T3.X0, T3.XB, (t-T3.START)/(p.arrive-T3.START)); cls='travel'; }
+      else if(t < T3.FLASH_END){ xPct=T3.XB; cls='wait'; }
+      else if(t < T3.DEPART_END){ xPct=lerp(T3.XB, T3.X1, (t-T3.FLASH_END)/(T3.DEPART_END-T3.FLASH_END)); cls='unified'; }
+      else { xPct=T3.X1; cls='unified'; }
+      p.el.style.left='calc('+xPct+'% - 54px)';
+      p.el.className='pkt '+cls;
+      p.el.querySelector('.hv').textContent = (t>=T3.SYNC? GMIN3 : p.hit);
+    });
+
+    // --- layer 2: batch formers driven by unified value ---
+    if(t < T3.FLASH_END){
+      resetFormers();
+    } else {
+      arrow2.classList.add('hot');                 // MIN -> value out
+      for(let p=0;p<3;p++) document.getElementById('fhit'+p).textContent=GMIN3;
+      // light mb chips in identical order across all three formers
+      let lit=0;
+      for(let k=0;k<NMB;k++){
+        if(t >= T3.MB_START + k*T3.MB_STEP){
+          for(let p=0;p<3;p++) document.getElementById(`fchip-${p}-${k}`).classList.add('on');
+          lit++;
+        }
+      }
+      if(t >= T3.READY){
+        arrow1.classList.add('hot');               // batch -> pipeline content
+        for(let p=0;p<3;p++){
+          document.getElementById('former'+p).classList.add('ready');
+          document.getElementById('fchk'+p).textContent='✓ batch & mb 顺序一致';
+          for(let k=0;k<NMB;k++) document.getElementById(`fchip-${p}-${k}`).className='mbchip fixed';
+        }
+      } else {
+        arrow1.classList.remove('hot');
+      }
+    }
+
+    // --- layer 3: the formed mb0..mb3 flow through stage0->1->2 (diagonal pipeline) ---
+    for(let s=0;s<3;s++){
+      for(let k=0;k<NMB;k++){
+        const b=document.getElementById(`pmb-${s}-${k}`);
+        const st=T3.PIPE_START + k*T3.MB_LAG + s*T3.STAGE_LAG;
+        const u=pipeActive(st, st+T3.MB_TRAVEL, t);   // handles cycle wrap (previous batch still draining)
+        if(u>=0){ b.style.left=(19+u*73)+'%'; b.style.opacity=1; }
+        else b.style.opacity=0;
+      }
+      document.getElementById('hint'+s).textContent = (t>1.4 && t<T3.PIPE_START) ? '（等待②组好的 batch…）' : '';
+    }
+
+    // --- captions ---
+    if(t < T3.START) cap3.innerHTML='① 三个 PP rank 的 prefetch 查询<strong>异步发起</strong>（到达时刻不同）。';
+    else if(t < PKT[2].arrive) cap3.innerHTML='① 先到的 rank 在 <span class="k">gloo CPU 后台线程</span>上<b style="color:var(--amber)">等待对齐</b>（不占 GPU）。';
+    else if(t < T3.FLASH_END) cap3.innerHTML='① <b style="color:var(--amber)">pp2 最慢</b>到达 → <code>all_reduce(MIN)</code> 把 8/6/7 <strong style="color:var(--green)">统一成 6</strong>。';
+    else if(t < T3.READY) cap3.innerHTML='② 统一后的 <b>storage hit = 6</b> 下发给各 rank 调度器 → 决定<strong>已缓存前缀长度 / batch size / micro-batch 顺序</strong>（mb0→mb3）。';
+    else cap3.innerHTML='③ 三个 rank 因拿到<strong style="color:var(--green)">同一个 6</strong> 而组出<strong style="color:var(--green)">完全一致的 batch 与 mb 顺序</strong>，喂给 PP 流水线；执行时序连续不被打断。<br><span style="color:var(--red)">⚠ 若 storage hit 不统一 → batch/mb 顺序逐 rank 发散 → PP 调度错位、卡死。</span>';
+  }
+  t3raf=requestAnimationFrame(t3frame);
+}
+function startTab3(restart){
+  t3on=true;
+  if(restart || !t3start){ t3start=performance.now(); t3paused=false; document.getElementById('play3').textContent='⏸ 暂停'; }
+  document.getElementById('pipe').classList.remove('paused');
+  if(!t3raf) t3raf=requestAnimationFrame(t3frame);
+}
+function stopTab3(){ t3on=false; if(t3raf){ cancelAnimationFrame(t3raf); t3raf=null; } }
+
+document.getElementById('play3').onclick=function(){
+  t3paused=!t3paused;
+  if(t3paused){ t3start=performance.now() - t3lastT*1000; } // freeze
+  else { t3start=performance.now() - t3lastT*1000; }        // resume from frozen t
+  this.textContent=t3paused?'▶ 播放':'⏸ 暂停';
+  document.getElementById('pipe').classList.toggle('paused', t3paused);
+};
+document.getElementById('replay3').onclick=()=>startTab3(true);
+
+/* ============================================================
+   TAB 4 : animated walk-through of the prefetch thread pipeline.
+   Highlights each stage in sequence; the chain "lights up" as the
+   data (PrefetchOperation → Ack → completed_tokens) flows down, and
+   the matching right-side why-card glows at each MIN sync point.
+   ============================================================ */
+let t4Token=0, t4Paused=false;
+const cap4El=document.getElementById('cap4');
+// [ids-to-light, caption]
+const T4SEQ=[
+  [['t4b0'], '调度器主线程把 prefetch 请求（writeback / load）放入队列，触发后台流水线。'],
+  [['t4a0'], '<b>PrefetchOperation</b> 入队 <code>prefetch_queue</code>，交给后台线程处理。'],
+  [['t4b1'], '① <b>prefetch_thread</b> 调 <code>_storage_hit_query()</code> 查询 L3 命中页数（各 rank 可能不同）。'],
+  [['t4m1','t4wa'], '◆ <b style="color:var(--amber)">第一个 MIN</b>：在 <code>prefetch_hits_sync_groups</code>（组1）对 storage_hit_count 取最小 → <b>抓取范围逐 rank 一致</b>。'],
+  [['t4a1'], '命中足够的请求落入 <code>prefetch_buffer</code>，进入实际 IO 加载。'],
+  [['t4b2'], '② <b>prefetch_io_aux_thread</b> 用 <code>_page_transfer()</code> 逐 batch 把页 L3→host；<b>每个 batch 恒产生 1 个 PrefetchAck</b>（出错也产生）。'],
+  [['t4a2'], '每个 batch 的 <b>PrefetchAck</b> 入队 <code>prefetch_sync_queue</code>。'],
+  [['t4b3'], '③ <b>prefetch_sync_thread</b> 对每个 ack 的 <b>completed_tokens</b> 做归约。'],
+  [['t4m2','t4wb'], '◆ <b style="color:var(--green)">第二个 MIN</b>：在 <code>prefetch_completion_sync_groups</code>（组2）对 completed_tokens 取最小 → <b>真正落盘前缀逐 rank 一致</b>。'],
+  [['t4a3'], '统一后的结果入队 <code>ack_prefetch_queue</code> 回到调度器。'],
+  [['t4b4','t4wc'], '调度器只插入 <b>completed_tokens</b> 长度的前缀 → <code>_insert_helper_host()</code>。每 batch 恒 1 个 ack，<b>reduce 次数严格相等 → 不会 hang</b>。'],
+];
+function t4clear(){
+  document.querySelectorAll('#scene4 .lit').forEach(e=>e.classList.remove('lit'));
+  document.querySelectorAll('#scene4 .dimmed').forEach(e=>e.classList.remove('dimmed'));
+}
+async function t4gate(my){ while(t4Paused){ await sleep(120); if(my!==t4Token) throw 0; } }
+async function t4step(ms,my){ await sleep(ms); await t4gate(my); if(my!==t4Token) throw 0; }
+async function runTab4(){
+  const my=++t4Token;
+  try{
+    while(true){
+      t4clear();
+      cap4El.innerHTML='沿数据流向下逐步点亮：两个 MIN 同步点 + 每 batch 恒定 1 个 ack。';
+      await t4step(1200,my);
+      for(const [ids,cap] of T4SEQ){
+        await t4gate(my);
+        ids.forEach(id=>document.getElementById(id).classList.add('lit'));
+        cap4El.innerHTML=cap;
+        await t4step(1900,my);
+      }
+      cap4El.innerHTML='✅ 闭环：两个 MIN（组1 命中数 + 组2 完成数）+ 每 batch 1 个 ack → <b style="color:var(--green)">PP 各 rank 的 host radix tree 严格一致</b>。';
+      await t4step(2600,my);
+    }
+  }catch(e){ /* cancelled */ }
+}
+function stopTab4(){ ++t4Token; }
+
+document.getElementById('play4').onclick=function(){
+  t4Paused=!t4Paused;
+  this.textContent=t4Paused?'▶ 播放':'⏸ 暂停';
+};
+document.getElementById('replay4').onclick=()=>{ t4Paused=false; document.getElementById('play4').textContent='⏸ 暂停'; runTab4(); };
+
+/* ============================================================
+   TAB 5 : two-request full lifecycle (PP=3 × TP=4).
+   Req A misses (GPU compute → L2 insert → L3 backup), then L2 is
+   evicted (delete, identical across ranks); Req B hits L3 and goes
+   through the two MIN syncs so every PP rank inserts the SAME prefix
+   into its host radix tree → trees stay consistent, no deadlock.
+   ============================================================ */
+const NPG=4;                       // pages tracked in the story
+const RANK_NAMES=['PP rank 0','PP rank 1','PP rank 2'];
+(function buildStory(){
+  let h='';
+  for(let p=0;p<3;p++){
+    let tps=''; for(let t=0;t<4;t++) tps+=`<span class="tp" id="s5tp-${p}-${t}"></span>`;
+    let nodes='<span class="root">host root</span>';
+    for(let i=0;i<NPG;i++) nodes+=`<div class="htnode" id="s5n-${p}-${i}">p${i}</div>`;
+    h+=`<div class="ranklane" id="s5lane${p}">
+      <div class="rankhdr"><span class="rname">${RANK_NAMES[p]}</span><span class="tps">${tps}</span>
+        <span class="rstat" id="s5stat${p}">idle</span></div>
+      <div class="htree">${nodes}</div></div>`;
+  }
+  document.getElementById('ranks').innerHTML=h;
+  let l3=''; for(let i=0;i<NPG;i++) l3+=`<div class="pg" id="s5l3-${i}">p${i}</div>`;
+  document.getElementById('l3pages').innerHTML=l3;
+})();
+
+let t5Token=0, t5Paused=false;
+const cap5=document.getElementById('cap5');
+const s5n=(p,i)=>document.getElementById(`s5n-${p}-${i}`);
+const setNode=(p,i,cls)=>{ s5n(p,i).className='htnode show '+cls; };
+const hideNode=(p,i)=>{ s5n(p,i).className='htnode'; };
+function rstat(p,txt,cls){ const e=document.getElementById('s5stat'+p); e.className='rstat '+(cls||''); e.textContent=txt; }
+function s5flag(txt,cls){ const e=document.getElementById('s5flag'); e.className='consist-flag '+(cls||''); e.innerHTML=txt; }
+function s5reset(){
+  for(let p=0;p<3;p++){
+    document.getElementById('s5lane'+p).className='ranklane';
+    rstat(p,'idle','');
+    for(let t=0;t<4;t++) document.getElementById(`s5tp-${p}-${t}`).className='tp';
+    for(let i=0;i<NPG;i++) hideNode(p,i);
+  }
+  for(let i=0;i<NPG;i++) document.getElementById('s5l3-'+i).className='pg';
+  document.getElementById('gpuBadge').className='gpu-badge';
+  document.getElementById('l3box').className='l3box';
+  document.getElementById('l3badge').className='badge'; document.getElementById('l3badge').textContent='';
+  document.getElementById('s5sync1').className='syncbadge g1';
+  document.getElementById('s5sync2').className='syncbadge g2';
+  s5flag('','');
+}
+async function t5gate(my){ while(t5Paused){ await sleep(120); if(my!==t5Token) throw 0; } }
+async function t5step(ms,my){ await sleep(ms); await t5gate(my); if(my!==t5Token) throw 0; }
+const allRanks=fn=>{ for(let p=0;p<3;p++) fn(p); };
+
+async function runTab5(){
+  const my=++t5Token;
+  try{
+    while(true){
+      s5reset();
+      cap5.innerHTML='场景 <b>PP=3 × TP=4</b>：每个 PP rank 维护一棵 <b>L2 host radix tree</b>，共享底层 <b>L3 持久化存储</b>。跟踪两个请求，看 host tree 如何保持一致。';
+      await t5step(2000,my);
+
+      /* ===== ACT 1 : Request A — miss → GPU → L2 insert → L3 backup ===== */
+      cap5.innerHTML='① <b>请求 A</b> 到达（需要 4 个 page 的前缀），3 个 PP rank 同时处理。';
+      allRanks(p=>{ document.getElementById('s5lane'+p).classList.add('active'); for(let t=0;t<4;t++) document.getElementById(`s5tp-${p}-${t}`).className='tp on'; rstat(p,'req A',''); });
+      await t5step(1700,my);
+
+      cap5.innerHTML='① 查 L2 host tree → <b style="color:var(--red)">miss</b>；查 L3 → <b style="color:var(--red)">miss</b>（存储为空）。';
+      allRanks(p=>rstat(p,'L2/L3 miss','miss'));
+      document.getElementById('l3badge').className='badge miss'; document.getElementById('l3badge').textContent='miss';
+      await t5step(1900,my);
+
+      cap5.innerHTML='① 回退到 <b>GPU 前向计算</b>，生成这 4 个 page 的 KV。';
+      document.getElementById('gpuBadge').classList.add('busy');
+      allRanks(p=>rstat(p,'compute','warn'));
+      await t5step(1800,my);
+      document.getElementById('gpuBadge').classList.remove('busy');
+
+      cap5.innerHTML='① 计算结果写入 <b>L2 host radix tree</b> → 3 个 rank <code>insert</code> <strong style="color:var(--green)">相同</strong>的前缀 p0–p3。';
+      for(let i=0;i<NPG;i++){ allRanks(p=>setNode(p,i,'inserting')); await t5step(240,my); }
+      allRanks(p=>{ for(let i=0;i<NPG;i++) setNode(p,i,'committed'); rstat(p,'L2: 4','hit'); });
+      s5flag('✓ 3 棵 host tree 同步插入 4 个 page（一致）','ok');
+      await t5step(1600,my);
+
+      cap5.innerHTML='① backup 线程把 L2 → <b>L3</b> 持久化（<code>write_backup</code> / <code>page_set</code>）。';
+      document.getElementById('l3box').classList.add('hot');
+      document.getElementById('l3badge').className='badge hit'; document.getElementById('l3badge').textContent='stored';
+      for(let i=0;i<NPG;i++){ document.getElementById('s5l3-'+i).className='pg show l3'; await t5step(220,my); }
+      s5flag('','');
+      await t5step(1500,my);
+
+      /* ===== ACT 1.5 : L2 eviction (delete consistency) ===== */
+      cap5.innerHTML='② host 内存压力 → L2 触发<strong style="color:var(--red)">淘汰</strong>（<code>evict_host</code>）。3 棵 host tree <b>完全一致</b> → 淘汰命中<strong>同一批节点</strong>；L3 仍保留。';
+      for(let i=NPG-1;i>=0;i--){ allRanks(p=>setNode(p,i,'evict')); await t5step(330,my); allRanks(p=>hideNode(p,i)); }
+      allRanks(p=>rstat(p,'L2 empty',''));
+      s5flag('✓ 3 棵 host tree 同步删除（delete 一致）','ok');
+      await t5step(2000,my);
+      s5flag('','');
+
+      /* ===== ACT 2 : Request B — L3 hit → 2 MIN syncs → consistent insert ===== */
+      cap5.innerHTML='③ <b>请求 B</b> 到达（复用 A 的前缀）。L2 host tree 已空 → <b style="color:var(--red)">L2 miss</b>，转向 L3。';
+      allRanks(p=>{ document.getElementById('s5lane'+p).classList.add('active'); rstat(p,'req B','warn'); });
+      await t5step(1700,my);
+
+      cap5.innerHTML='③ <b>prefetch_thread</b> 各 rank 向 L3 查命中页数 → 结果可能<strong style="color:var(--amber)">不同</strong>（host 视图/内存差异）：4 / 3 / 4。';
+      const hitq=[4,3,4];
+      allRanks(p=>{ rstat(p,'L3 hit '+hitq[p], hitq[p]===4?'hit':'warn'); for(let i=0;i<hitq[p];i++) setNode(p,i,'warn'); });
+      s5flag('⚠ 查询长度不一致（4/3/4）→ 若各自建树，host tree 会发散','bad');
+      await t5step(2600,my);
+
+      cap5.innerHTML='◆ <b>第一个 MIN</b> @ <code>prefetch_hits_sync_groups</code>（组1，gloo/CPU，含 TP环+PP环）：<code>all_reduce(MIN)</code> 统一查询长度 = <b>3</b>。';
+      document.getElementById('s5sync1').classList.add('fire');
+      await t5step(1500,my);
+      allRanks(p=>{ for(let i=0;i<NPG;i++) hideNode(p,i); for(let i=0;i<3;i++) setNode(p,i,'matched'); rstat(p,'match 3','hit'); });
+      s5flag('✓ 抓取范围统一 = 3 → match_prefix 逐 rank 一致','ok');
+      await t5step(2200,my);
+
+      cap5.innerHTML='③ <b>prefetch_io_aux_thread</b> 逐 batch 把 page 从 L3 拉回 L2（<code>_page_transfer</code>），每 batch 产 1 个 PrefetchAck。';
+      document.getElementById('s5sync1').classList.remove('fire');
+      for(let i=0;i<3;i++){ allRanks(p=>setNode(p,i,'inserting')); await t5step(300,my); }
+      await t5step(700,my);
+
+      cap5.innerHTML='③ 逐页加载<strong style="color:var(--amber)">部分失败</strong>：rank2 第 3 页 <code>page_get</code> 未成功 → completed_tokens = 3 / 3 / 2。';
+      const done=[3,3,2];
+      allRanks(p=>{ for(let i=0;i<3;i++){ if(i<done[p]) setNode(p,i,'committed'); else setNode(p,i,'warn'); } rstat(p,'done '+done[p], done[p]===3?'hit':'warn'); });
+      s5flag('⚠ 实际落盘不一致（3/3/2）→ 若各自插入，host tree 会发散','bad');
+      await t5step(2600,my);
+
+      cap5.innerHTML='◆ <b>第二个 MIN</b> @ <code>prefetch_completion_sync_groups</code>（组2，<b>独立 communicator</b>）：<code>all_reduce(MIN)</code> 统一 completed_tokens = <b>2</b>。';
+      document.getElementById('s5sync2').classList.add('fire');
+      await t5step(1500,my);
+
+      cap5.innerHTML='③ 各 rank 只把统一的 <b>2 个 page</b> 插入 L2 host tree（<code>_insert_helper_host</code>）→ 3 棵 host tree <strong style="color:var(--green)">再次完全一致</strong>。';
+      allRanks(p=>{ for(let i=0;i<NPG;i++) hideNode(p,i); for(let i=0;i<2;i++) setNode(p,i,'committed'); rstat(p,'L2: 2','hit'); });
+      s5flag('✓ 插入长度统一 = 2 → 3 棵 host tree 完全一致','ok');
+      await t5step(2400,my);
+
+      cap5.innerHTML='✅ 两套<strong>独立 gloo 组</strong>（组1 命中数、组2 完成数）+ 每 batch 恒 1 个 ack → 各 rank 对 host tree 的<strong>插入/删除完全一致</strong> → <b style="color:var(--green)">host radix tree 始终一致，后台 collective 不会死锁</b>。';
+      document.getElementById('s5sync1').classList.add('fire');
+      await t5step(3400,my);
+      document.getElementById('s5sync1').classList.remove('fire');
+      document.getElementById('s5sync2').classList.remove('fire');
+      await t5step(700,my);
+    }
+  }catch(e){ /* cancelled */ }
+}
+function stopTab5(){ ++t5Token; }
+
+document.getElementById('play5').onclick=function(){
+  t5Paused=!t5Paused;
+  this.textContent=t5Paused?'▶ 播放':'⏸ 暂停';
+};
+document.getElementById('replay5').onclick=()=>{ t5Paused=false; document.getElementById('play5').textContent='⏸ 暂停'; runTab5(); };
+
+/* ============================================================
+   TAB 6 : PrefetchAck count alignment & anti-hang.
+   Each storage batch in _page_transfer emits exactly one PrefetchAck,
+   and prefetch_sync_thread does one all_reduce(MIN) on set-2 per ack.
+   So #acks == #batches == #set-2 collectives, and it must be equal on
+   every rank. We compare: (good) each batch always emits 1 ack even on
+   error → counts aligned → safe; (bad) break-on-error drops an ack →
+   one rank does fewer reduces → the others block forever → hang.
+   ============================================================ */
+const T6NB=3;                 // number of storage batches / acks
+(function buildAck(){
+  let m='';
+  for(let p=0;p<3;p++){
+    let slots='';
+    for(let k=0;k<T6NB;k++) slots+=`<div class="ackchip pending" id="ack-${p}-${k}">ack${k}</div>`;
+    m+=`<div class="ackrow" id="ackrow${p}"><div class="acklabel"><b>PP rank ${p}</b> · _page_transfer</div><div class="ackslots">${slots}</div></div>`;
+  }
+  document.getElementById('ackmesh').innerHTML=m;
+  let b='';
+  for(let k=0;k<T6NB;k++) b+=`<div class="bar" id="bar${k}">barrier ${k}<span class="bcount" id="bcnt${k}">0/3</span></div>`;
+  document.getElementById('barcols').innerHTML=b;
+})();
+
+let t6Token=0;
+const cap6=document.getElementById('cap6');
+const banner6=document.getElementById('banner6');
+const ackEl=(p,k)=>document.getElementById(`ack-${p}-${k}`);
+const barEl=k=>document.getElementById('bar'+k);
+const bcnt=k=>document.getElementById('bcnt'+k);
+function t6reset(){
+  ++t6Token;
+  for(let p=0;p<3;p++){
+    document.getElementById('ackrow'+p).className='ackrow';
+    for(let k=0;k<T6NB;k++){ ackEl(p,k).className='ackchip pending'; ackEl(p,k).innerHTML=`ack${k}`; }
+  }
+  for(let k=0;k<T6NB;k++){ barEl(k).className='bar'; bcnt(k).textContent='0/3'; }
+  banner6.className='banner'; banner6.textContent='';
+}
+async function s6(ms,my){ await sleep(ms); if(my!==t6Token) throw 0; }
+
+// nacks: how many acks each rank produces (rank index -> count)
+async function playAck(nacks, label){
+  t6reset(); const my=t6Token;
+  try{
+    cap6.innerHTML=label;
+    await s6(700,my);
+    for(let k=0;k<T6NB;k++){
+      barEl(k).classList.add('waiting');
+      let arrived=0;
+      // ranks emit ack k one by one (async arrival)
+      for(let p=0;p<3;p++){
+        await s6(520,my);
+        if(k<nacks[p]){
+          ackEl(p,k).className='ackchip emit';
+          arrived++; bcnt(k).textContent=arrived+'/3';
+        }
+      }
+      await s6(400,my);
+      if(arrived===3){
+        barEl(k).className='bar fired'; bcnt(k).textContent='3/3 ✓';
+        for(let p=0;p<3;p++) ackEl(p,k).className='ackchip passed';
+        cap6.innerHTML=(window.TR||((z,e)=>z))(
+          `barrier ${k}：3 个 rank 的 ack 都到齐 → <code>all_reduce(MIN)</code> 返回 → 本轮通过。`,
+          `barrier ${k}: all 3 ranks' acks arrived → <code>all_reduce(MIN)</code> returns → this round passes.`);
+        await s6(700,my);
+      }else{
+        // a rank is missing this ack -> collective can never complete
+        barEl(k).className='bar dead'; bcnt(k).textContent=arrived+'/3 ✗';
+        for(let p=0;p<3;p++){
+          if(k<nacks[p]){ ackEl(p,k).className='ackchip wait'; document.getElementById('ackrow'+p).classList.add('blocked'); }
+          else { ackEl(p,k).className='ackchip missing'; ackEl(p,k).innerHTML=`ack${k}<span class="err">${(window.TR||((z,e)=>z))('缺失','missing')}</span>`; }
+        }
+        cap6.innerHTML=(window.TR||((z,e)=>z))(
+          `barrier ${k}：只有 <b>${arrived}/3</b> 个 rank 进了 <code>all_reduce</code>（有 rank 早早 break、少产一个 ack）→ 已到达的 rank <b style="color:var(--amber)">永远阻塞</b>在这次 collective 上。`,
+          `barrier ${k}: only <b>${arrived}/3</b> ranks entered <code>all_reduce</code> (a rank broke early and emitted one fewer ack) → the ranks that arrived are <b style="color:var(--amber)">blocked forever</b> on this collective.`);
+        banner6.className='banner bad'; banner6.textContent='💥 HANG：组2 reduce 次数不一致（3/3/2）→ collective 永不返回';
+        return;
+      }
+    }
+    banner6.className='banner ok'; banner6.textContent='✅ 安全：每 rank 都做了 '+T6NB+' 次 reduce，次数严格相等，全部对齐完成';
+    cap6.innerHTML='每个 batch 恒产 1 个 ack（出错也产）→ <b>ack 数逐 rank 相等</b> → 组2 的 collective 一一对应 → 不会 hang。';
+  }catch(e){ /* cancelled */ }
+}
+function stopTab6(){ ++t6Token; }
+document.getElementById('play6good').onclick=()=>playAck([3,3,3],
+  '<b style="color:var(--green)">正确</b>：即便某 batch 出错，<code>_page_transfer</code> 也<strong>继续循环、照常产 ack</strong> → 三个 rank 都产 3 个 ack。');
+document.getElementById('play6bad').onclick=()=>playAck([3,3,2],
+  '<b style="color:var(--red)">错误（反面教材）</b>：rank2 在 batch2 出错就 <code>break</code> → 只产 2 个 ack，比别人少一个。');
+document.getElementById('reset6').onclick=()=>{ t6reset(); cap6.innerHTML='选择场景：<b>每 batch 恒 1 ack</b> → 次数对齐、安全；<b>出错就 break</b> → ack 缺一个 → 组2 reduce 错位 → hang。'; };
+
+/* ---------- tab switching ---------- */
+const ctl1=document.querySelectorAll('.controls')[1]; // [0] is now ctl5 (story)
+const ctl2=document.getElementById('ctl2');
+const ctl3=document.getElementById('ctl3');
+const ctl4=document.getElementById('ctl4');
+const ctl5=document.getElementById('ctl5');
+const ctl6=document.getElementById('ctl6');
+document.querySelectorAll('.tab').forEach(tab=>{
+  tab.onclick=()=>{
+    document.querySelectorAll('.tab').forEach(x=>x.classList.remove('active'));
+    tab.classList.add('active');
+    const w=tab.dataset.tab;
+    document.getElementById('scene5').classList.toggle('hidden', w!=='story');
+    document.getElementById('scene1').classList.toggle('hidden', w!=='consistency');
+    document.getElementById('scene2').classList.toggle('hidden', w!=='deadlock');
+    document.getElementById('scene3').classList.toggle('hidden', w!=='skew');
+    document.getElementById('scene4').classList.toggle('hidden', w!=='threads');
+    document.getElementById('scene6').classList.toggle('hidden', w!=='ackalign');
+    ctl5.style.display = w==='story'?'flex':'none';
+    ctl1.style.display = w==='consistency'?'flex':'none';
+    ctl2.style.display = w==='deadlock'?'flex':'none';
+    ctl3.style.display = w==='skew'?'flex':'none';
+    ctl4.style.display = w==='threads'?'flex':'none';
+    ctl6.style.display = w==='ackalign'?'flex':'none';
+    if(w==='story'){ ++t1Token; stopTab3(); stopTab4(); stopTab6(); t5Paused=false; document.getElementById('play5').textContent='⏸ 暂停'; runTab5(); }
+    else if(w==='consistency'){ t1Paused=false; document.getElementById('play1').textContent='⏸ 暂停'; runTab1(); stopTab3(); stopTab4(); stopTab5(); stopTab6(); }
+    else if(w==='skew'){ ++t1Token; startTab3(true); stopTab4(); stopTab5(); stopTab6(); }
+    else if(w==='threads'){ ++t1Token; stopTab3(); stopTab5(); stopTab6(); t4Paused=false; document.getElementById('play4').textContent='⏸ 暂停'; runTab4(); }
+    else if(w==='ackalign'){ ++t1Token; stopTab3(); stopTab4(); stopTab5(); t6reset(); }
+    else{ ++t1Token; stopTab3(); stopTab4(); stopTab5(); stopTab6(); }
+  };
+});
+ctl1.style.display='none';
+ctl2.style.display='none';
+ctl3.style.display='none';
+ctl4.style.display='none';
+ctl6.style.display='none';
+resetMesh2();
+runTab5();
+</script>
+
+<script>
+/* ============================================================
+   i18n: translate by text-content (robust to innerHTML normalization).
+   PAIRS = [zhHTML, enHTML]; keys derived from stripped textContent.
+   A MutationObserver re-translates dynamic captions on the fly.
+   ============================================================ */
+(function(){
+  const PAIRS = [
+    // header
+    ['HiCache × Pipeline Parallel：树一致性 & 防死锁','HiCache × Pipeline Parallel: Tree Consistency & Deadlock Avoidance'],
+    ['拓扑 <b>PP=3 × TP=8 = 24 ranks</b> · 行=TP 组、列=PP 组 · MIN all-reduce 保证 radix tree 一致 · 2 套 gloo 组避免后台 collective 死锁',
+     'Topology <b>PP=3 × TP=8 = 24 ranks</b> · rows = TP groups, cols = PP groups · MIN all-reduce keeps radix trees identical · 2 gloo group-sets avoid background-collective deadlock'],
+    // tabs
+    ['① 两请求全流程（L3 命中/未命中 · host tree 一致）','① Two-Request Lifecycle (L3 miss/hit · host-tree consistency)'],
+    ['② 树一致性（自动播放）','② Tree Consistency (auto-play)'],
+    ['③ 为什么 2 个组不死锁','③ Why 2 Groups Avoid Deadlock'],
+    ['④ 异步时间差 × MIN 统一步调','④ Async Skew × MIN Lockstep'],
+    ['⑤ 线程关系 & 树一致性','⑤ Thread Relationships & Consistency'],
+    ['⑥ PrefetchAck 对齐 & 防 hang','⑥ PrefetchAck Alignment & Anti-Hang'],
+    // tab6 note / legend / barrier label / scenario captions / banners
+    ['每个 <b>storage batch</b> 在 <code>_page_transfer</code> 里恒产 <b>1 个 PrefetchAck</b>；<code>prefetch_sync_thread</code> 对<strong>每个 ack</strong> 在组2 做一次 <code>all_reduce(MIN)</code>。所以 <b>ack 数 = batch 数 = 组2 collective 次数</b>，必须逐 rank 相等。',
+     'Each <b>storage batch</b> in <code>_page_transfer</code> always emits <b>exactly 1 PrefetchAck</b>; <code>prefetch_sync_thread</code> does one <code>all_reduce(MIN)</code> on set 2 <strong>per ack</strong>. So <b>#acks = #batches = #set-2 collectives</b>, and it must be equal on every rank.'],
+    ['<span class="sw" style="background:var(--blue)"></span>ack 已产出（参与本轮 reduce）','<span class="sw" style="background:var(--blue)"></span>ack emitted (joins this reduce)'],
+    ['<span class="sw" style="background:var(--green)"></span>barrier 凑齐 3/3 → 通过','<span class="sw" style="background:var(--green)"></span>barrier reaches 3/3 → pass'],
+    ['<span class="sw" style="background:var(--amber)"></span>已到达，等待缺席方','<span class="sw" style="background:var(--amber)"></span>arrived, waiting for the absent rank'],
+    ['<span class="sw" style="background:var(--red)"></span>缺失 ack → 永远等不到','<span class="sw" style="background:var(--red)"></span>missing ack → never arrives'],
+    ['◆ <code>all_reduce(MIN)</code> @ 组2（prefetch_completion_sync_groups）· 每个 ack 一次 barrier',
+     '◆ <code>all_reduce(MIN)</code> @ set 2 (prefetch_completion_sync_groups) · one barrier per ack'],
+    ['选择场景：<b>每 batch 恒 1 ack</b> → 次数对齐、安全；<b>出错就 break</b> → ack 缺一个 → 组2 reduce 错位 → hang。',
+     'Pick a scenario: <b>one ack per batch</b> → counts aligned, safe; <b>break on error</b> → one ack missing → set-2 reduces misalign → hang.'],
+    ['<b style="color:var(--green)">正确</b>：即便某 batch 出错，<code>_page_transfer</code> 也<strong>继续循环、照常产 ack</strong> → 三个 rank 都产 3 个 ack。',
+     '<b style="color:var(--green)">Correct</b>: even if a batch errors, <code>_page_transfer</code> <strong>keeps looping and still emits the ack</strong> → all three ranks emit 3 acks.'],
+    ['<b style="color:var(--red)">错误（反面教材）</b>：rank2 在 batch2 出错就 <code>break</code> → 只产 2 个 ack，比别人少一个。',
+     '<b style="color:var(--red)">Wrong (anti-pattern)</b>: rank2 hits an error at batch2 and <code>break</code>s → emits only 2 acks, one fewer than the others.'],
+    ['▶ 正确（每 batch 恒 1 ack）','▶ Correct (one ack per batch)'],
+    ['▶ 错误（出错 break → ack 缺失）','▶ Wrong (break on error → missing ack)'],
+    ['每个 batch 恒产 1 个 ack（出错也产）→ <b>ack 数逐 rank 相等</b> → 组2 的 collective 一一对应 → 不会 hang。',
+     'Each batch always emits one ack (even on error) → <b>ack counts equal across ranks</b> → set-2 collectives match one-to-one → no hang.'],
+    ['✅ 安全：每 rank 都做了 3 次 reduce，次数严格相等，全部对齐完成','✅ Safe: every rank did 3 reduces, counts strictly equal, all aligned and complete'],
+    ['💥 HANG：组2 reduce 次数不一致（3/3/2）→ collective 永不返回','💥 HANG: set-2 reduce counts differ (3/3/2) → the collective never returns'],
+    // tab5 legend chips
+    ['<span class="sw" style="background:var(--blue)"></span>GPU 计算 / 插入中','<span class="sw" style="background:var(--blue)"></span>GPU compute / inserting'],
+    ['<span class="sw" style="background:var(--cyan)"></span>match 命中前缀','<span class="sw" style="background:var(--cyan)"></span>matched prefix'],
+    ['<span class="sw" style="background:var(--amber)"></span>各 rank 不一致（待 MIN 统一）','<span class="sw" style="background:var(--amber)"></span>diverged per rank (await MIN)'],
+    ['<span class="sw" style="background:var(--green)"></span>已提交 / 一致','<span class="sw" style="background:var(--green)"></span>committed / consistent'],
+    ['<span class="sw" style="background:var(--red)"></span>未命中 / 淘汰删除','<span class="sw" style="background:var(--red)"></span>miss / evicted'],
+    // tab5 static labels
+    ['<b>L3 持久化存储</b>（storage backend，3 个 rank 共享视图）',
+     '<b>L3 persistent storage</b> (storage backend, shared view across 3 ranks)'],
+    ['GPU 计算','GPU compute'],
+    ['◆ MIN 组1 · prefetch_hits_sync_groups · storage_hit_count','◆ MIN set 1 · prefetch_hits_sync_groups · storage_hit_count'],
+    ['◆ MIN 组2 · prefetch_completion_sync_groups · completed_tokens','◆ MIN set 2 · prefetch_completion_sync_groups · completed_tokens'],
+    // tab5 consistency flags
+    ['✓ 3 棵 host tree 同步插入 4 个 page（一致）','✓ all 3 host trees insert 4 pages in sync (consistent)'],
+    ['✓ 3 棵 host tree 同步删除（delete 一致）','✓ all 3 host trees delete in sync (consistent)'],
+    ['⚠ 查询长度不一致（4/3/4）→ 若各自建树，host tree 会发散','⚠ query lengths differ (4/3/4) → building trees independently diverges them'],
+    ['✓ 抓取范围统一 = 3 → match_prefix 逐 rank 一致','✓ fetch range unified = 3 → match_prefix identical per rank'],
+    ['⚠ 实际落盘不一致（3/3/2）→ 若各自插入，host tree 会发散','⚠ actual loads differ (3/3/2) → inserting independently diverges trees'],
+    ['✓ 插入长度统一 = 2 → 3 棵 host tree 完全一致','✓ insert length unified = 2 → all 3 host trees identical'],
+    // tab5 step captions
+    ['场景 <b>PP=3 × TP=4</b>：每个 PP rank 维护一棵 <b>L2 host radix tree</b>，共享底层 <b>L3 持久化存储</b>。跟踪两个请求，看 host tree 如何保持一致。',
+     'Scenario <b>PP=3 × TP=4</b>: each PP rank keeps an <b>L2 host radix tree</b> over a shared <b>L3 persistent storage</b>. We follow two requests and see how the trees stay consistent.'],
+    ['① <b>请求 A</b> 到达（需要 4 个 page 的前缀），3 个 PP rank 同时处理。',
+     '① <b>Request A</b> arrives (needs a 4-page prefix); all 3 PP ranks process it together.'],
+    ['① 查 L2 host tree → <b style="color:var(--red)">miss</b>；查 L3 → <b style="color:var(--red)">miss</b>（存储为空）。',
+     '① Query L2 host tree → <b style="color:var(--red)">miss</b>; query L3 → <b style="color:var(--red)">miss</b> (storage empty).'],
+    ['① 回退到 <b>GPU 前向计算</b>，生成这 4 个 page 的 KV。',
+     '① Fall back to <b>GPU forward compute</b> to produce the KV for these 4 pages.'],
+    ['① 计算结果写入 <b>L2 host radix tree</b> → 3 个 rank <code>insert</code> <strong style="color:var(--green)">相同</strong>的前缀 p0–p3。',
+     '① Results are written into the <b>L2 host radix tree</b> → all 3 ranks <code>insert</code> the <strong style="color:var(--green)">same</strong> prefix p0–p3.'],
+    ['① backup 线程把 L2 → <b>L3</b> 持久化（<code>write_backup</code> / <code>page_set</code>）。',
+     '① The backup thread persists L2 → <b>L3</b> (<code>write_backup</code> / <code>page_set</code>).'],
+    ['② host 内存压力 → L2 触发<strong style="color:var(--red)">淘汰</strong>（<code>evict_host</code>）。3 棵 host tree <b>完全一致</b> → 淘汰命中<strong>同一批节点</strong>；L3 仍保留。',
+     '② Host-memory pressure → L2 <strong style="color:var(--red)">eviction</strong> (<code>evict_host</code>). The 3 host trees are <b>identical</b> → eviction hits the <strong>same nodes</strong>; L3 keeps them.'],
+    ['③ <b>请求 B</b> 到达（复用 A 的前缀）。L2 host tree 已空 → <b style="color:var(--red)">L2 miss</b>，转向 L3。',
+     '③ <b>Request B</b> arrives (reuses A\u2019s prefix). The L2 host tree is empty → <b style="color:var(--red)">L2 miss</b>, fall through to L3.'],
+    ['③ <b>prefetch_thread</b> 各 rank 向 L3 查命中页数 → 结果可能<strong style="color:var(--amber)">不同</strong>（host 视图/内存差异）：4 / 3 / 4。',
+     '③ <b>prefetch_thread</b> on each rank queries L3 hit pages → results may <strong style="color:var(--amber)">differ</strong> (host view / memory): 4 / 3 / 4.'],
+    ['◆ <b>第一个 MIN</b> @ <code>prefetch_hits_sync_groups</code>（组1，gloo/CPU，含 TP环+PP环）：<code>all_reduce(MIN)</code> 统一查询长度 = <b>3</b>。',
+     '◆ <b>First MIN</b> @ <code>prefetch_hits_sync_groups</code> (set 1, gloo/CPU, TP+PP rings): <code>all_reduce(MIN)</code> unifies the query length = <b>3</b>.'],
+    ['③ <b>prefetch_io_aux_thread</b> 逐 batch 把 page 从 L3 拉回 L2（<code>_page_transfer</code>），每 batch 产 1 个 PrefetchAck。',
+     '③ <b>prefetch_io_aux_thread</b> pulls pages L3→L2 batch by batch (<code>_page_transfer</code>), one PrefetchAck per batch.'],
+    ['③ 逐页加载<strong style="color:var(--amber)">部分失败</strong>：rank2 第 3 页 <code>page_get</code> 未成功 → completed_tokens = 3 / 3 / 2。',
+     '③ Per-page load <strong style="color:var(--amber)">partially fails</strong>: rank2\u2019s 3rd page <code>page_get</code> fails → completed_tokens = 3 / 3 / 2.'],
+    ['◆ <b>第二个 MIN</b> @ <code>prefetch_completion_sync_groups</code>（组2，<b>独立 communicator</b>）：<code>all_reduce(MIN)</code> 统一 completed_tokens = <b>2</b>。',
+     '◆ <b>Second MIN</b> @ <code>prefetch_completion_sync_groups</code> (set 2, <b>independent communicator</b>): <code>all_reduce(MIN)</code> unifies completed_tokens = <b>2</b>.'],
+    ['③ 各 rank 只把统一的 <b>2 个 page</b> 插入 L2 host tree（<code>_insert_helper_host</code>）→ 3 棵 host tree <strong style="color:var(--green)">再次完全一致</strong>。',
+     '③ Each rank inserts only the unified <b>2 pages</b> into its L2 host tree (<code>_insert_helper_host</code>) → all 3 host trees are <strong style="color:var(--green)">identical again</strong>.'],
+    ['✅ 两套<strong>独立 gloo 组</strong>（组1 命中数、组2 完成数）+ 每 batch 恒 1 个 ack → 各 rank 对 host tree 的<strong>插入/删除完全一致</strong> → <b style="color:var(--green)">host radix tree 始终一致，后台 collective 不会死锁</b>。',
+     '✅ Two <strong>independent gloo group-sets</strong> (set 1 hit count, set 2 completed tokens) + exactly one ack per batch → every rank\u2019s <strong>inserts/deletes are identical</strong> → <b style="color:var(--green)">host radix trees stay consistent and background collectives never deadlock</b>.'],
+    // buttons
+    ['⏸ 暂停','⏸ Pause'],['▶ 播放','▶ Play'],['⟲ 重播','⟲ Replay'],['重置','Reset'],
+    ['▶ 1 套组（死锁）','▶ 1 group set (deadlock)'],['▶ 2 套组（安全）','▶ 2 group sets (safe)'],
+    // tab1 legend + tree title + init caption
+    ['<span class="sw" style="background:var(--amber)"></span>命中数被截断（不一致）','<span class="sw" style="background:var(--amber)"></span>hit count truncated (inconsistent)'],
+    ['<span class="sw" style="background:var(--blue)"></span>TP 组内 MIN 后','<span class="sw" style="background:var(--blue)"></span>after MIN within TP group'],
+    ['<span class="sw" style="background:var(--green)"></span>PP 组内 MIN 后（全局一致）','<span class="sw" style="background:var(--green)"></span>after MIN within PP group (global)'],
+    ['所有 24 个 rank 共享同一棵 radix tree','all 24 ranks share one radix tree'],
+    ['自动播放中…','auto-playing…'],
+    // tab1 captions
+    ['拓扑 <b>PP=3 × TP=8 = 24 个 rank</b>：每个 PP stage 下挂 8 个 TP rank。',
+     'Topology <b>PP=3 × TP=8 = 24 ranks</b>: each PP stage holds 8 TP ranks.'],
+    ['① 各 rank <span class="k">独立</span>向 L3 查询前缀命中。<b style="color:var(--amber)">注意 r10、r15 因 host 内存压力被截断</b>（6 / 7 页）。',
+     '① Each rank <span class="k">independently</span> queries L3 for prefix hits. <b style="color:var(--amber)">Note r10 & r15 are truncated by host-memory pressure</b> (6 / 7 pages).'],
+    ['② 若各 rank 按自己的命中数建 radix tree → 树高不一致 → 后续 PP 集合通信 <b style="color:var(--red)">shape mismatch → crash</b>。',
+     '② If each rank builds its radix tree from its own hit count → tree heights differ → next PP collective <b style="color:var(--red)">shape mismatch → crash</b>.'],
+    ['③ 第一步：在 <span class="k">TP 组（每一行 8 个 rank）</span>内 <code>all_reduce(MIN)</code>。',
+     '③ Step 1: <code>all_reduce(MIN)</code> within each <span class="k">TP group (a row of 8 ranks)</span>.'],
+    ['③ TP 组归约后：<b>每一行变得一致</b>（PP0=8, PP1=6, PP2=8 = 各行最小值）。',
+     '③ After TP reduce: <b>each row is uniform</b> (PP0=8, PP1=6, PP2=8 = per-row min).'],
+    ['④ 第二步：在 <span class="k">PP 组（每一列 3 个 rank）</span>内 <code>all_reduce(MIN)</code> → 收敛到全局最小值。',
+     '④ Step 2: <code>all_reduce(MIN)</code> within each <span class="k">PP group (a column of 3 ranks)</span> → converge to the global minimum.'],
+    ['④ PP 组归约后：<b style="color:var(--green)">全部 24 个 rank 命中数 = 6</b>（最长公共前缀）。',
+     '④ After PP reduce: <b style="color:var(--green)">all 24 ranks hit = 6</b> (longest common prefix).'],
+    ['⑤ 所有 rank 都只 prefetch / 建树到 6 → <span style="color:var(--green)">24 个 rank 的 radix tree 完全一致 ✓</span>',
+     '⑤ Every rank prefetches / builds the tree only up to 6 → <span style="color:var(--green)">all 24 radix trees are identical ✓</span>'],
+    // tab2 legend + note + groups + init + captions + banners
+    ['<span class="sw" style="background:var(--purple)"></span><b>prefetch_thread</b>（独立后台线程）· reduce(storage_hit_count)','<span class="sw" style="background:var(--purple)"></span><b>prefetch_thread</b> (independent background thread) · reduce(storage_hit_count)'],
+    ['<span class="sw" style="background:var(--cyan)"></span><b>prefetch_sync_thread</b>（独立后台线程）· reduce(completed_tokens)','<span class="sw" style="background:var(--cyan)"></span><b>prefetch_sync_thread</b> (independent background thread) · reduce(completed_tokens)'],
+    ['每个 cell = 1 个 rank，内含 2 个独立后台线程（小圆点 ●A ●B）。每一行是一个 <b>TP communicator</b>，每一列是一个 <b>PP communicator</b>。',
+     'Each cell = 1 rank, holding 2 independent background threads (dots ●A ●B). Each row is a <b>TP communicator</b>, each column a <b>PP communicator</b>.'],
+    ['<b>prefetch_hits_sync_groups</b><br>命中页数归约组（含 TP 环 + PP 环）<br><span style="font-size:11px">reduce(storage_hit_count)</span>',
+     '<b>prefetch_hits_sync_groups</b><br>hit-count reduce set (TP rings + PP rings)<br><span style="font-size:11px">reduce(storage_hit_count)</span>'],
+    ['<b>prefetch_completion_sync_groups</b><br>完成 token 归约组（含 TP 环 + PP 环）<br><span style="font-size:11px">reduce(completed_tokens)</span>',
+     '<b>prefetch_completion_sync_groups</b><br>completed-token reduce set (TP rings + PP rings)<br><span style="font-size:11px">reduce(completed_tokens)</span>'],
+    ['选择场景：用 <b>1 套组</b> 会死锁，用 <b>2 套组</b> 则安全。','Pick a scenario: <b>1 group set</b> deadlocks, <b>2 group sets</b> are safe.'],
+    ['只有 <b>1 套组</b>：prefetch_thread(A) 与 prefetch_sync_thread(B) 共用同一个 communicator 集。',
+     'Only <b>1 group set</b>: prefetch_thread(A) and prefetch_sync_thread(B) share the same communicator set.'],
+    ['两个后台线程<b>独立调度、顺序不定</b>：同一个 TP 环里，有的 rank 先发 A，有的先发 B。',
+     'The two background threads are <b>scheduled independently, order unpredictable</b>: within one TP ring some ranks post A first, others post B first.'],
+    ['同一个 communicator 上各 rank 提交的 collective <b style="color:var(--red)">不是同一个</b>（A 与 B 错位）→ rendezvous 永远配不上。',
+     'On the same communicator the collectives submitted by different ranks are <b style="color:var(--red)">not the same</b> (A vs B misaligned) → rendezvous never matches.'],
+    ['只要任一 communicator 上 A/B 交错，该环就死锁 → 全局 PP/TP 通信连环卡住。',
+     'If A/B interleave on any communicator, that ring deadlocks → all PP/TP communication hangs in a chain.'],
+    ['💥 DEADLOCK — 整个 24-rank job 卡死','💥 DEADLOCK — the whole 24-rank job hangs'],
+    ['用 <b>2 套独立组</b>：<b style="color:var(--purple)">A 永远走 prefetch_hits_sync_groups</b>，<b style="color:var(--cyan)">B 永远走 prefetch_completion_sync_groups</b>。',
+     'With <b>2 independent group sets</b>: <b style="color:var(--purple)">A always uses prefetch_hits_sync_groups</b>, <b style="color:var(--cyan)">B always uses prefetch_completion_sync_groups</b>.'],
+    ['第一波：所有 rank 的 <b>prefetch_thread</b> 只在 <code>prefetch_hits_sync_groups</code> 上提交 A → 序列一致。',
+     'Wave 1: every rank\u2019s <b>prefetch_thread</b> posts A only on <code>prefetch_hits_sync_groups</code> → consistent order.'],
+    ['✓ TP 环 + PP 环上 A 全部到齐 → 第一波归约完成。','✓ A arrives on every TP ring + PP ring → wave 1 reduce done.'],
+    ['第二波：所有 rank 的 <b>prefetch_sync_thread</b> 只在 <code>prefetch_completion_sync_groups</code> 上提交 B → 序列一致。',
+     'Wave 2: every rank\u2019s <b>prefetch_sync_thread</b> posts B only on <code>prefetch_completion_sync_groups</code> → consistent order.'],
+    ['每个 communicator 上的 collective 序列在所有 rank <b style="color:var(--green)">完全一致</b>（A→组1、B→组2 不交叉）→ 不会死锁。',
+     'The collective sequence on each communicator is <b style="color:var(--green)">identical across ranks</b> (A→set1, B→set2, never crossing) → no deadlock.'],
+    ['✅ 安全 — 24 个 rank 全部对齐完成','✅ Safe — all 24 ranks aligned and complete'],
+    // tab3 titles / conduits / flow-note / lane hint / captions / formers
+    ['③ 主 PP 流水线执行<strong>时序</strong> <span class="tag gpu">NCCL · GPU</span> <span style="color:var(--muted);font-size:11px;">时序连续、错峰流动，<strong style="color:var(--green)">不被后台 prefetch 同步打断</strong></span>',
+     '③ Main PP pipeline execution <strong>timing</strong> <span class="tag gpu">NCCL · GPU</span> <span style="color:var(--muted);font-size:11px;">continuous, staggered flow, <strong style="color:var(--green)">never interrupted by background prefetch sync</strong></span>'],
+    ['↑ 流水线跑的正是②组好的 <strong>mb0→mb3</strong>，沿 stage0→1→2 错峰对角推进',
+     '↑ The pipeline runs exactly the <strong>mb0→mb3</strong> composed in ②, advancing diagonally stage0→1→2'],
+    ['▲ 组好的 <b>batch &amp; micro-batch 顺序</b> 喂给流水线（内容）','▲ The composed <b>batch &amp; micro-batch order</b> feeds the pipeline (content)'],
+    ['② 三个 PP rank 用<strong>同一个 storage hit</strong> 组 batch（内容必须逐 rank 一致）',
+     '② The three PP ranks compose the batch from <strong>the same storage hit</strong> (content must match per rank)'],
+    ['▲ <code>all_reduce(MIN)</code> 输出统一值 <b>6</b> → 决定 batch size','▲ <code>all_reduce(MIN)</code> outputs the unified value <b>6</b> → determines batch size'],
+    ['① 异步 prefetch 查询 → <code>all_reduce(MIN)</code> <span class="tag cpu">gloo · CPU 后台线程</span>',
+     '① Async prefetch query → <code>all_reduce(MIN)</code> <span class="tag cpu">gloo · CPU background thread</span>'],
+    ['（等待②组好的 batch…）','(waiting for batch from ②…)'],
+    ['① 三个 PP rank 的 prefetch 查询<strong>异步发起</strong>（到达时刻不同）。','① The three PP ranks issue prefetch queries <strong>asynchronously</strong> (different arrival times).'],
+    ['① 先到的 rank 在 <span class="k">gloo CPU 后台线程</span>上<b style="color:var(--amber)">等待对齐</b>（不占 GPU）。',
+     '① Earlier ranks <b style="color:var(--amber)">wait to align</b> on the <span class="k">gloo CPU background thread</span> (no GPU use).'],
+    ['① <b style="color:var(--amber)">pp2 最慢</b>到达 → <code>all_reduce(MIN)</code> 把 8/6/7 <strong style="color:var(--green)">统一成 6</strong>。',
+     '① <b style="color:var(--amber)">pp2 is slowest</b> to arrive → <code>all_reduce(MIN)</code> unifies 8/6/7 <strong style="color:var(--green)">into 6</strong>.'],
+    ['② 统一后的 <b>storage hit = 6</b> 下发给各 rank 调度器 → 决定<strong>已缓存前缀长度 / batch size / micro-batch 顺序</strong>（mb0→mb3）。',
+     '② The unified <b>storage hit = 6</b> goes to each rank\u2019s scheduler → determines <strong>cached prefix length / batch size / micro-batch order</strong> (mb0→mb3).'],
+    ['③ 三个 rank 因拿到<strong style="color:var(--green)">同一个 6</strong> 而组出<strong style="color:var(--green)">完全一致的 batch 与 mb 顺序</strong>，喂给 PP 流水线；执行时序连续不被打断。<br><span style="color:var(--red)">⚠ 若 storage hit 不统一 → batch/mb 顺序逐 rank 发散 → PP 调度错位、卡死。</span>',
+     '③ Because all three ranks get <strong style="color:var(--green)">the same 6</strong>, they compose <strong style="color:var(--green)">identical batches and mb order</strong> fed to the PP pipeline; timing stays continuous.<br><span style="color:var(--red)">⚠ If storage hit weren\u2019t unified → batch/mb order diverges per rank → PP scheduling mismatch & hang.</span>'],
+    // formers
+    ['调度器 · PP rank 0','Scheduler · PP rank 0'],['调度器 · PP rank 1','Scheduler · PP rank 1'],['调度器 · PP rank 2','Scheduler · PP rank 2'],
+    ['已缓存前缀 storage hit = ','cached prefix storage hit = '],
+    [' 页 → 决定 batch 组成',' pages → determines batch'],
+    ['✓ batch & mb 顺序一致','✓ identical batch & mb order'],
+    // mesh labels
+    ['<b>PP stage 0</b><br>(TP 组)','<b>PP stage 0</b><br>(TP group)'],
+    ['<b>PP stage 1</b><br>(TP 组)','<b>PP stage 1</b><br>(TP group)'],
+    ['<b>PP stage 2</b><br>(TP 组)','<b>PP stage 2</b><br>(TP group)'],
+    ['PP 组(列)<br>每列跨 3 个 stage →','PP groups (cols)<br>each spans 3 stages →'],
+    // tab4 flow boxes
+    ['调度器 Scheduler <span class="pin">主线程</span>','Scheduler <span class="pin">main thread</span>'],
+    ['发起 prefetch 请求（writeback / load）','Issues prefetch requests (writeback / load)'],
+    ['▼ <b>prefetch_queue</b>（PrefetchOperation）','▼ <b>prefetch_queue</b> (PrefetchOperation)'],
+    ['① prefetch_thread <span class="pin">storage-hit 线程</span>','① prefetch_thread <span class="pin">storage-hit thread</span>'],
+    ['<code>_storage_hit_query()</code> 查询 L3 命中页数；命中足够→放 prefetch_buffer，不足→prefetch_revoke_queue',
+     '<code>_storage_hit_query()</code> queries L3 hit pages; enough hits → prefetch_buffer, too few → prefetch_revoke_queue'],
+    ['◆ all_reduce(MIN) storage_hit_count <small>@ prefetch_hits_sync_groups（组1，gloo/CPU，含 TP 环 + PP 环）</small>',
+     '◆ all_reduce(MIN) storage_hit_count <small>@ prefetch_hits_sync_groups (set 1, gloo/CPU, TP rings + PP rings)</small>'],
+    ['▼ <b>prefetch_buffer</b>','▼ <b>prefetch_buffer</b>'],
+    ['② prefetch_io_aux_thread <span class="pin">IO 加载线程</span>','② prefetch_io_aux_thread <span class="pin">IO load thread</span>'],
+    ['<code>_page_transfer()</code> 逐 batch 把页从 L3 读入 host；累加 <b>completed_tokens</b>；<b>每个 storage batch 产生 1 个 PrefetchAck</b>（出错也照常产生）',
+     '<code>_page_transfer()</code> loads pages L3→host batch by batch; accumulates <b>completed_tokens</b>; <b>each storage batch emits exactly 1 PrefetchAck</b> (even on error)'],
+    ['▼ <b>prefetch_sync_queue</b>（PrefetchAck）','▼ <b>prefetch_sync_queue</b> (PrefetchAck)'],
+    ['③ prefetch_sync_thread <span class="pin">completion-token 线程</span>','③ prefetch_sync_thread <span class="pin">completion-token thread</span>'],
+    ['对每个 ack 的 <b>completed_tokens</b> 做归约','Reduces <b>completed_tokens</b> of every ack'],
+    ['◆ all_reduce(MIN) completed_tokens <small>@ prefetch_completion_sync_groups（组2，gloo/CPU，含 TP 环 + PP 环）</small>',
+     '◆ all_reduce(MIN) completed_tokens <small>@ prefetch_completion_sync_groups (set 2, gloo/CPU, TP rings + PP rings)</small>'],
+    ['▼ <b>ack_prefetch_queue</b>','▼ <b>ack_prefetch_queue</b>'],
+    ['调度器写入 host radix tree','Scheduler inserts into host radix tree'],
+    ['只插入 <b>completed_tokens</b> 长度的前缀 → <code>_insert_helper_host()</code>','Inserts only the <b>completed_tokens</b>-long prefix → <code>_insert_helper_host()</code>'],
+    ['为什么 MIN(storage_hit) 一致？','Why does MIN(storage_hit) ensure consistency?'],
+    ['各 rank 命中可能不同（host 内存截断、L3 视图差异）。MIN 取<b>最长公共可命中前缀</b> → 所有 rank <b>抓取范围一致</b>，不会各抓不同长度。',
+     'Hits may differ per rank (host-mem truncation, L3 view differences). MIN takes the <b>longest common hittable prefix</b> → every rank <b>fetches the same range</b>, never different lengths.'],
+    ['为什么 MIN(completed_tokens) 一致？','Why does MIN(completed_tokens) ensure consistency?'],
+    ['即便抓取范围一致，实际逐页加载仍可能<b>部分失败</b>（<code>page_get</code> 返回 n≠batch）。MIN 只提交<b>所有 rank 都成功落盘的最长公共前缀</b> → 写入 host tree 的长度逐 rank 相同。',
+     'Even with the same fetch range, per-page loads can <b>partially fail</b> (<code>page_get</code> returns n≠batch). MIN commits only the <b>longest common prefix every rank loaded successfully</b> → identical insert length per rank.'],
+    ['为什么不会 hang？','Why no hang?'],
+    ['每个 storage batch <b>都产生且仅产生一个 PrefetchAck</b>（即使出错也照常产生）→ 每个 rank 参与的 reduce <b>次数严格相等</b>，collective 一一对齐。两个 MIN 一起保证：<b>插入 host tree 的前缀逐 rank 完全相同 → 树一致</b>。',
+     'Each storage batch <b>emits exactly one PrefetchAck</b> (even on error) → every rank joins the <b>same number of reduces</b>, collectives align one-to-one. The two MINs together guarantee: <b>the prefix inserted into the host tree is identical per rank → trees are consistent</b>.'],
+    ['两个 MIN 同步点（组1 命中数、组2 完成数）+ 每 batch 恒定 1 个 ack，共同保证 PP 各 rank 的 host radix tree 严格一致。',
+     'Two MIN sync points (set 1 = hit count, set 2 = completed tokens) + exactly one ack per batch together keep every PP rank\u2019s host radix tree strictly identical.'],
+    // tab4 animated step captions
+    ['沿数据流向下逐步点亮：两个 MIN 同步点 + 每 batch 恒定 1 个 ack。',
+     'Light up step by step along the data flow: two MIN sync points + exactly one ack per batch.'],
+    ['调度器主线程把 prefetch 请求（writeback / load）放入队列，触发后台流水线。',
+     'The scheduler main thread enqueues a prefetch request (writeback / load), kicking off the background pipeline.'],
+    ['<b>PrefetchOperation</b> 入队 <code>prefetch_queue</code>，交给后台线程处理。',
+     'A <b>PrefetchOperation</b> enters <code>prefetch_queue</code>, handed to the background threads.'],
+    ['① <b>prefetch_thread</b> 调 <code>_storage_hit_query()</code> 查询 L3 命中页数（各 rank 可能不同）。',
+     '① <b>prefetch_thread</b> calls <code>_storage_hit_query()</code> to query L3 hit pages (may differ per rank).'],
+    ['◆ <b style="color:var(--amber)">第一个 MIN</b>：在 <code>prefetch_hits_sync_groups</code>（组1）对 storage_hit_count 取最小 → <b>抓取范围逐 rank 一致</b>。',
+     '◆ <b style="color:var(--amber)">First MIN</b>: take the min of storage_hit_count on <code>prefetch_hits_sync_groups</code> (set 1) → <b>the fetch range is identical per rank</b>.'],
+    ['命中足够的请求落入 <code>prefetch_buffer</code>，进入实际 IO 加载。',
+     'Requests with enough hits drop into <code>prefetch_buffer</code> for the actual IO load.'],
+    ['② <b>prefetch_io_aux_thread</b> 用 <code>_page_transfer()</code> 逐 batch 把页 L3→host；<b>每个 batch 恒产生 1 个 PrefetchAck</b>（出错也产生）。',
+     '② <b>prefetch_io_aux_thread</b> uses <code>_page_transfer()</code> to move pages L3→host batch by batch; <b>each batch always emits exactly one PrefetchAck</b> (even on error).'],
+    ['每个 batch 的 <b>PrefetchAck</b> 入队 <code>prefetch_sync_queue</code>。',
+     'Each batch\u2019s <b>PrefetchAck</b> enters <code>prefetch_sync_queue</code>.'],
+    ['③ <b>prefetch_sync_thread</b> 对每个 ack 的 <b>completed_tokens</b> 做归约。',
+     '③ <b>prefetch_sync_thread</b> reduces the <b>completed_tokens</b> of every ack.'],
+    ['◆ <b style="color:var(--green)">第二个 MIN</b>：在 <code>prefetch_completion_sync_groups</code>（组2）对 completed_tokens 取最小 → <b>真正落盘前缀逐 rank 一致</b>。',
+     '◆ <b style="color:var(--green)">Second MIN</b>: take the min of completed_tokens on <code>prefetch_completion_sync_groups</code> (set 2) → <b>the actually-loaded prefix is identical per rank</b>.'],
+    ['统一后的结果入队 <code>ack_prefetch_queue</code> 回到调度器。',
+     'The unified result enters <code>ack_prefetch_queue</code> back to the scheduler.'],
+    ['调度器只插入 <b>completed_tokens</b> 长度的前缀 → <code>_insert_helper_host()</code>。每 batch 恒 1 个 ack，<b>reduce 次数严格相等 → 不会 hang</b>。',
+     'The scheduler inserts only the <b>completed_tokens</b>-long prefix → <code>_insert_helper_host()</code>. One ack per batch means <b>equal reduce counts → no hang</b>.'],
+    ['✅ 闭环：两个 MIN（组1 命中数 + 组2 完成数）+ 每 batch 1 个 ack → <b style="color:var(--green)">PP 各 rank 的 host radix tree 严格一致</b>。',
+     '✅ Closed loop: two MINs (set 1 hit count + set 2 completed tokens) + one ack per batch → <b style="color:var(--green)">every PP rank\u2019s host radix tree is strictly identical</b>.'],
+  ];
+
+  const SEL = ['header h1','header p','.tab','.tree-title','.caption','.banner','.legend .chip',
+    '#scene2 .note','.grp','.t3-title','.clabel','.flow-note','.lane-hint','.ctl',
+    '.pp-label','.pp-foot .lab','.former h5','.former .hitbox .ht1','.former .hitbox .ht2','.former .chk',
+    '.tbox .tname','.tbox .tdesc','.tarrow','.minnode','.whycard h4','.whycard p',
+    '.gpu-badge span','.l3lab','.syncbadge','.consist-flag','#scene6 .note','.barlabel'].join(',');
+
+  const tmp=document.createElement('div');
+  const strip=h=>{ tmp.innerHTML=h; return tmp.textContent.replace(/\s+/g,' ').trim(); };
+  const EN={}, ZH={};
+  PAIRS.forEach(([zh,en])=>{ EN[strip(zh)]=en; ZH[strip(en)]=zh; });
+
+  let LANG='zh';
+  // runtime helper for dynamic strings that contain interpolated values
+  // (cannot be matched by the static dictionary). Reads the live LANG.
+  window.TR=(zh,en)=> LANG==='en' ? en : zh;
+  let mo=null;
+  let suppress=false;   // re-entrancy guard: ignore mutations we cause ourselves
+  function translateEl(el){
+    const k=strip(el.innerHTML);
+    const next = LANG==='en' ? EN[k] : ZH[k];
+    // only write when there is a real change, otherwise we churn the DOM
+    if(next!==undefined && next!==el.innerHTML) el.innerHTML=next;
+  }
+  function translateAll(){
+    suppress=true;
+    document.querySelectorAll(SEL).forEach(translateEl);
+    if(mo) mo.takeRecords();   // drop the records our own writes just generated
+    suppress=false;
+  }
+
+  window.toggleLang=function(){
+    LANG = LANG==='zh' ? 'en' : 'zh';
+    document.getElementById('langBtn').textContent = LANG==='zh' ? 'EN' : '中文';
+    document.documentElement.lang = LANG==='zh' ? 'zh-CN' : 'en';
+    translateAll();
+  };
+
+  // keep dynamic captions translated as JS rewrites them
+  mo=new MutationObserver(muts=>{
+    if(suppress) return;       // skip the mutations our own translations produced
+    suppress=true;
+    muts.forEach(m=>{
+      const tgt = m.target.nodeType===1 ? m.target : m.target.parentElement;
+      if(!tgt) return;
+      const c = tgt.closest && tgt.closest(SEL);
+      if(c) translateEl(c);
+    });
+    mo.takeRecords();
+    suppress=false;
+  });
+  mo.observe(document.body,{subtree:true,childList:true,characterData:true});
+})();
+</script>
+</body>
+</html>
+<style>#langBtn,header,.tabs{display:none!important;}body{background:#0e1117;}.wrap{padding-top:10px;}</style>
+<script>
+(function(){
+  // English-only, single-tab embed: reuse all original JS
+  try{ if(window.toggleLang) toggleLang(); }catch(e){}   // zh -> en
+  var TAB="skew";
+  var btn=document.querySelector('.tab[data-tab="'+TAB+'"]');
+  if(btn){ btn.click(); }
+})();
+</script>
diff --git a/public/images/blog/pp_hicache_consistency/hicache_pp_animation_en_threads.html b/public/images/blog/pp_hicache_consistency/hicache_pp_animation_en_threads.html
new file mode 100644
index 000000000..b1449776a
--- /dev/null
+++ b/public/images/blog/pp_hicache_consistency/hicache_pp_animation_en_threads.html
@@ -0,0 +1,1559 @@
+<!DOCTYPE html>
+<html lang="zh-CN">
+<head>
+<meta charset="UTF-8" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<title>HiCache × PP=3 · TP=8：树一致性 & 防死锁 动画</title>
+<style>
+  :root{
+    --bg:#0e1117; --panel:#161b22; --panel2:#1c2330; --line:#30363d;
+    --text:#e6edf3; --muted:#8b949e;
+    --blue:#58a6ff; --green:#3fb950; --red:#f85149; --amber:#d29922;
+    --purple:#bc8cff; --cyan:#56d4dd;
+  }
+  *{box-sizing:border-box;}
+  body{
+    margin:0; background:radial-gradient(1200px 600px at 50% -10%, #18202c, var(--bg));
+    color:var(--text); font-family:-apple-system,BlinkMacSystemFont,"Segoe UI","PingFang SC","Microsoft YaHei",sans-serif;
+    line-height:1.5;
+  }
+  #langBtn{ position:fixed; top:14px; right:16px; z-index:50; background:var(--panel2); border:1px solid var(--blue);
+    color:var(--blue); padding:7px 14px; border-radius:999px; cursor:pointer; font-size:13px; font-weight:600; }
+  #langBtn:hover{ background:var(--blue); color:#04101f; }
+  header{ text-align:center; padding:20px 16px 4px; }
+  header h1{ margin:0 0 4px; font-size:21px; }
+  header p{ margin:0; color:var(--muted); font-size:13px; }
+  .tabs{ display:flex; gap:8px; justify-content:center; margin:16px auto 8px; flex-wrap:wrap; }
+  .tab{ background:var(--panel); border:1px solid var(--line); color:var(--text);
+    padding:9px 16px; border-radius:999px; cursor:pointer; font-size:14px; transition:all .15s; }
+  .tab.active{ background:var(--blue); color:#04101f; border-color:var(--blue); font-weight:600; }
+  .wrap{ max-width:1120px; margin:0 auto; padding:0 16px 60px; }
+  .scene{ background:var(--panel); border:1px solid var(--line); border-radius:14px; padding:18px; position:relative; }
+  .hidden{ display:none; }
+  .controls{ display:flex; gap:10px; justify-content:center; align-items:center; margin:14px 0 4px; flex-wrap:wrap; }
+  button.ctl{ background:var(--panel2); border:1px solid var(--line); color:var(--text);
+    padding:8px 16px; border-radius:8px; cursor:pointer; font-size:14px; }
+  button.ctl:hover{ border-color:var(--blue); }
+  button.ctl.primary{ background:var(--green); color:#04140a; border-color:var(--green); font-weight:600; }
+  button.ctl.alt{ background:var(--red); color:#1a0606; border-color:var(--red); font-weight:600; }
+  .caption{ text-align:center; min-height:44px; margin:10px auto 0; max-width:900px; font-size:15px; }
+  .caption .k{ color:var(--cyan); font-weight:600; }
+  code{ background:#0d1117; padding:1px 6px; border-radius:4px; border:1px solid var(--line); color:var(--cyan); font-size:12px; }
+  .legend{ display:flex; gap:18px; justify-content:center; font-size:12px; color:var(--muted); flex-wrap:wrap; margin-bottom:10px; }
+  .chip{ display:inline-flex; align-items:center; gap:6px; }
+  .sw{ width:14px; height:14px; border-radius:4px; display:inline-block; }
+
+  /* ---------- mesh ---------- */
+  .mesh-head{ display:flex; align-items:center; gap:8px; margin-left:96px; margin-bottom:4px; }
+  .tp-hdr{ flex:1; display:flex; gap:6px; }
+  .tp-hdr .th{ flex:1; text-align:center; font-size:11px; color:var(--muted); }
+  .pp-row{ display:flex; align-items:center; gap:8px; margin-bottom:6px; }
+  .pp-label{ width:88px; font-size:12px; color:var(--muted); text-align:right; line-height:1.2; }
+  .pp-label b{ color:var(--text); }
+  .row-cells{ flex:1; display:flex; gap:6px; border:2px solid transparent; border-radius:10px; padding:3px; transition:border-color .3s, box-shadow .3s; }
+  .row-cells.ring-a{ border-color:var(--purple); box-shadow:0 0 12px rgba(188,140,255,.25); }
+  .row-cells.ring-b{ border-color:var(--cyan); box-shadow:0 0 12px rgba(86,212,221,.25); }
+  .row-cells.ring-bad{ border-color:var(--red); box-shadow:0 0 12px rgba(248,81,73,.3); }
+  .cell{
+    flex:1; height:54px; border-radius:8px; background:#0d1117; border:1px solid var(--line);
+    display:flex; flex-direction:column; align-items:center; justify-content:center; gap:1px;
+    transition:background .3s, border-color .3s, transform .15s, box-shadow .15s; position:relative;
+  }
+  .cell .v{ font-size:18px; font-weight:800; color:var(--muted); transition:color .3s; }
+  .cell .v small{ font-size:9px; font-weight:500; }
+  .cell .rk{ font-size:9px; color:#5b6470; }
+  .cell.varied .v{ color:var(--amber); }
+  .cell.tpmin{ background:linear-gradient(180deg,#143055,#102844); border-color:var(--blue); }
+  .cell.tpmin .v{ color:#cfe6ff; }
+  .cell.gmin{ background:linear-gradient(180deg,#0f3a1d,#0e2c18); border-color:var(--green); }
+  .cell.gmin .v{ color:#c4f7d4; }
+  .cell.sweep{ transform:translateY(-3px); box-shadow:0 0 14px rgba(88,166,255,.55); border-color:var(--blue); }
+  .cell.bad{ background:linear-gradient(180deg,#3a1414,#2a1010); border-color:var(--red); }
+  .cell.bad .v{ color:#ffd4d0; }
+  .cell.dim{ opacity:.35; }
+  /* thread dots */
+  .tdots{ display:flex; gap:5px; margin-top:1px; }
+  .td{ width:9px; height:9px; border-radius:50%; border:1px solid var(--line); background:#0d1117; transition:all .25s; }
+  .td.a{ border-color:var(--purple); }
+  .td.b{ border-color:var(--cyan); }
+  .td.a.on{ background:var(--purple); box-shadow:0 0 8px var(--purple); }
+  .td.b.on{ background:var(--cyan); box-shadow:0 0 8px var(--cyan); }
+  .td.done{ background:var(--green); border-color:var(--green); box-shadow:0 0 6px var(--green); }
+  .td.dead{ background:var(--red); border-color:var(--red); box-shadow:0 0 6px var(--red); }
+
+  /* pp-group footer (columns) */
+  .pp-foot{ display:flex; align-items:center; gap:8px; margin-top:6px; }
+  .pp-foot .lab{ width:88px; font-size:11px; color:var(--muted); text-align:right; }
+  .pp-foot .cols{ flex:1; display:flex; gap:6px; padding:0 3px; }
+  .pp-foot .col{ flex:1; height:20px; border-radius:6px; border:1px dashed var(--line); font-size:9px;
+    color:#5b6470; display:flex; align-items:center; justify-content:center; transition:all .3s; }
+  .pp-foot .col.ring-a{ border-color:var(--purple); color:#e3d3ff; }
+  .pp-foot .col.ring-b{ border-color:var(--cyan); color:#cdf6fa; }
+  .pp-foot .col.ring-bad{ border-color:var(--red); color:#ffd4d0; }
+
+  /* shared tree (tab1) */
+  .tree-box{ margin-top:14px; display:flex; flex-direction:column; align-items:center; }
+  .tree-title{ font-size:12px; color:var(--muted); margin-bottom:6px; }
+  .tree{ display:flex; flex-direction:column; align-items:center; gap:4px; min-height:30px; }
+  .tnode{ width:220px; height:20px; border-radius:5px; background:#0d1117; border:1px solid var(--line);
+    display:flex; align-items:center; justify-content:center; font-size:11px; color:var(--muted);
+    opacity:0; transform:translateY(-6px); transition:opacity .25s, transform .25s; }
+  .tnode.show{ opacity:1; transform:none; background:linear-gradient(90deg,#0f3a1d,#196b32); border-color:var(--green); color:#c4f7d4; }
+
+  /* groups panel (tab2) */
+  .groups{ display:flex; gap:16px; justify-content:center; margin-top:12px; flex-wrap:wrap; }
+  .grp{ border:1px dashed var(--line); border-radius:10px; padding:8px 14px; font-size:12px; color:var(--muted);
+    min-width:230px; text-align:center; transition:all .3s; }
+  .grp b{ color:var(--text); }
+  .grp.g1.hot{ border-color:var(--purple); color:#e3d3ff; box-shadow:0 0 14px rgba(188,140,255,.25); }
+  .grp.g2.hot{ border-color:var(--cyan); color:#cdf6fa; box-shadow:0 0 14px rgba(86,212,221,.25); }
+  .banner{ text-align:center; font-weight:800; font-size:18px; min-height:24px; margin-top:8px; }
+  .banner.ok{ color:var(--green); } .banner.bad{ color:var(--red); }
+  .note{ font-size:12px; color:var(--muted); text-align:center; margin-top:4px; }
+
+  /* ---------- tab3: async skew x MIN ---------- */
+  .t3-section{ background:var(--panel2); border:1px solid var(--line); border-radius:12px; padding:10px 12px; margin-bottom:14px; }
+  .t3-title{ font-size:13px; margin-bottom:8px; display:flex; align-items:center; gap:8px; }
+  .t3-title .tag{ font-size:10px; padding:2px 8px; border-radius:999px; border:1px solid var(--line); color:var(--muted); }
+  .t3-title .tag.gpu{ border-color:var(--blue); color:#cfe6ff; }
+  .t3-title .tag.cpu{ border-color:var(--amber); color:#ffe2ab; }
+  /* top pipeline */
+  .pipe{ position:relative; }
+  .lane{ position:relative; height:34px; margin:6px 0; border-radius:8px; background:#0d1117;
+    border:1px solid var(--line); overflow:hidden; }
+  .lane .lname{ position:absolute; left:8px; top:50%; transform:translateY(-50%); font-size:11px; color:var(--muted); z-index:3; }
+  .mb{ position:absolute; top:6px; height:22px; width:52px; border-radius:6px; z-index:2;
+    display:flex; align-items:center; justify-content:center; font-size:10px; font-weight:700;
+    opacity:0; color:#c4f7d4; border:1px solid var(--green);
+    background:linear-gradient(180deg,#0f3a1d,#15311f); transition:opacity .18s; }
+  .lane-hint{ position:absolute; right:10px; top:50%; transform:translateY(-50%); font-size:10px; color:#46505e; z-index:1; }
+  .flow-note{ font-size:11px; color:var(--green); text-align:right; margin-top:2px; }
+
+  /* bottom sync area */
+  .sync-area{ position:relative; height:170px; border-radius:8px; background:#0d1117; border:1px solid var(--line); overflow:hidden; }
+  .barrier{ position:absolute; top:6px; bottom:6px; width:0; border-left:2px dashed var(--amber); z-index:2; }
+  .barrier .blabel{ position:absolute; top:-2px; left:8px; font-size:11px; color:var(--amber); white-space:nowrap; }
+  .barrier.fire{ border-left-color:var(--green); box-shadow:-2px 0 18px rgba(63,185,80,.5); }
+  .slane{ position:absolute; left:0; right:0; height:1px; border-top:1px dashed #1f2733; }
+  .slabel{ position:absolute; left:8px; font-size:11px; color:var(--muted); z-index:3; transform:translateY(-50%); }
+  .pkt{ position:absolute; height:30px; width:108px; border-radius:8px; z-index:3; transform:translateY(-50%);
+    display:flex; align-items:center; justify-content:center; gap:6px; font-size:11px; font-weight:700;
+    border:1px solid var(--line); background:#11161f; transition:background .25s, border-color .25s, box-shadow .25s; }
+  .pkt .hv{ font-size:14px; }
+  .pkt.travel{ border-color:var(--blue); color:#cfe6ff; }
+  .pkt.wait{ border-color:var(--amber); color:#ffe2ab; background:#1d1808; animation:pulse 1s infinite; }
+  .pkt.unified{ border-color:var(--green); color:#c4f7d4; background:linear-gradient(180deg,#0f3a1d,#0e2c18); box-shadow:0 0 12px rgba(63,185,80,.35); }
+  .clock{ position:absolute; right:10px; top:8px; font-size:11px; color:var(--muted); z-index:4; }
+  /* causal arrows + batch formers */
+  .t3-conduit{ position:relative; height:38px; margin:-2px 0 8px; display:flex; align-items:center; justify-content:center; }
+  .t3-conduit .clabel{ font-size:12px; color:var(--muted); transition:color .3s; z-index:2; background:var(--panel); padding:0 8px; }
+  .t3-conduit.hot .clabel{ color:var(--green); }
+  .t3-conduit::before{ content:""; position:absolute; left:50%; top:4px; bottom:4px; width:2px; transform:translateX(-50%);
+    background:repeating-linear-gradient(to top, #2b3340 0 6px, transparent 6px 12px); }
+  .t3-conduit.hot::before{ background:repeating-linear-gradient(to top, var(--green) 0 6px, transparent 6px 12px); opacity:.5; }
+  .t3-conduit .spark{ position:absolute; left:50%; bottom:3px; width:11px; height:11px; border-radius:50%;
+    background:var(--green); opacity:0; box-shadow:0 0 12px var(--green); transform:translateX(-50%); z-index:3; }
+  .t3-conduit.hot .spark{ animation:rise .85s linear infinite; }
+  .t3-conduit.hot .spark.s2{ animation-delay:.42s; }
+  @keyframes rise{ 0%{opacity:0; transform:translate(-50%,0);} 15%{opacity:1;} 100%{opacity:0; transform:translate(-50%,-34px);} }
+  .pipe.fed .lane{ border-color:var(--green); box-shadow:0 0 10px rgba(63,185,80,.25); }
+  .formers{ display:flex; gap:14px; justify-content:center; flex-wrap:wrap; }
+  .former{ flex:1; min-width:240px; max-width:320px; background:#0d1117; border:1px solid var(--line);
+    border-radius:10px; padding:10px 12px; transition:border-color .3s, box-shadow .3s; }
+  .former h5{ margin:0 0 6px; font-size:13px; }
+  .former .hitbox{ font-size:12px; color:var(--muted); margin-bottom:8px; }
+  .former .hitbox b{ color:var(--amber); font-size:14px; }
+  .former.ready{ border-color:var(--green); box-shadow:0 0 12px rgba(63,185,80,.2); }
+  .former.ready .hitbox b{ color:var(--green); }
+  .mbrow{ display:flex; gap:6px; }
+  .mbchip{ flex:1; height:24px; border-radius:6px; background:#11161f; border:1px solid var(--line);
+    display:flex; align-items:center; justify-content:center; font-size:11px; color:var(--muted); opacity:.3; transition:all .25s; }
+  .mbchip.on{ opacity:1; background:linear-gradient(180deg,#143055,#102844); border-color:var(--blue); color:#cfe6ff; }
+  .mbchip.fixed{ opacity:1; background:linear-gradient(180deg,#0f3a1d,#0e2c18); border-color:var(--green); color:#c4f7d4; }
+  .former .chk{ font-size:12px; color:var(--green); margin-top:6px; min-height:16px; }
+
+  /* ---------- tab4: thread relationships ---------- */
+  .t4wrap{ display:flex; gap:18px; flex-wrap:wrap; }
+  .t4flow{ flex:2; min-width:340px; display:flex; flex-direction:column; align-items:stretch; gap:0; }
+  .t4why{ flex:1; min-width:280px; display:flex; flex-direction:column; gap:10px; }
+  .tbox{ border:1px solid var(--line); border-radius:10px; padding:10px 12px; background:#0d1117; position:relative; }
+  .tbox .tname{ font-size:14px; font-weight:700; display:flex; align-items:center; gap:8px; }
+  .tbox .tdesc{ font-size:11.5px; color:var(--muted); margin-top:3px; }
+  .tbox .pin{ font-size:10px; padding:1px 7px; border-radius:999px; border:1px solid var(--line); color:var(--muted); }
+  .tbox.thread-hit{ border-left:4px solid var(--purple); }
+  .tbox.thread-io{ border-left:4px solid var(--blue); }
+  .tbox.thread-sync{ border-left:4px solid var(--cyan); }
+  .tbox.sched{ border-left:4px solid var(--muted); background:#11161f; }
+  .tarrow{ text-align:center; color:var(--muted); font-size:11px; padding:5px 0; position:relative; }
+  .tarrow b{ color:var(--text); }
+  .minnode{ align-self:center; margin:6px 0; border:1.5px solid var(--amber); border-radius:999px;
+    padding:7px 16px; font-size:12px; color:#ffe2ab; background:#1d1808; font-weight:600; }
+  .minnode.g2{ border-color:var(--green); color:#c4f7d4; background:#0f2a18; }
+  .minnode small{ display:block; font-size:10px; color:var(--muted); font-weight:400; }
+  .whycard{ border:1px solid var(--line); border-radius:10px; padding:11px 13px; background:var(--panel2); }
+  .whycard h4{ margin:0 0 6px; font-size:13px; }
+  .whycard p{ margin:0; font-size:12px; color:var(--muted); }
+  .whycard.a{ border-left:4px solid var(--purple); }
+  .whycard.b{ border-left:4px solid var(--cyan); }
+  .whycard.c{ border-left:4px solid var(--green); }
+  .whycard code{ font-size:11px; }
+
+  /* ---------- tab4 animation states ---------- */
+  #scene4 .tbox,#scene4 .tarrow,#scene4 .minnode,#scene4 .whycard{ transition:all .3s; }
+  #scene4 .dimmed{ opacity:.3; }
+  .tbox.lit{ box-shadow:0 0 18px rgba(88,166,255,.45); transform:translateX(5px); }
+  .tbox.thread-hit.lit{ box-shadow:0 0 18px rgba(188,140,255,.55); }
+  .tbox.thread-io.lit{ box-shadow:0 0 18px rgba(88,166,255,.55); }
+  .tbox.thread-sync.lit{ box-shadow:0 0 18px rgba(86,212,221,.55); }
+  .tbox.sched.lit{ box-shadow:0 0 18px rgba(63,185,80,.4); }
+  .tarrow.lit{ color:var(--green); font-weight:700; }
+  .tarrow.lit b{ color:var(--green); }
+  .tarrow.lit::after{ content:" ●"; color:var(--green); animation:t4blink .7s infinite; }
+  @keyframes t4blink{ 0%,100%{opacity:.2;} 50%{opacity:1;} }
+  .minnode.lit{ box-shadow:0 0 22px rgba(210,153,34,.65); transform:scale(1.06); }
+  .minnode.g2.lit{ box-shadow:0 0 22px rgba(63,185,80,.65); }
+  .whycard.lit{ box-shadow:0 0 18px rgba(63,185,80,.35); transform:translateY(-3px); border-left-width:6px; }
+
+  /* ---------- tab5: two-request full lifecycle story ---------- */
+  #scene5 .story-top{ display:flex; gap:14px; align-items:stretch; margin-bottom:14px; }
+  .gpu-badge{ width:120px; border:1px solid var(--line); border-radius:12px; background:#0d1117;
+    display:flex; flex-direction:column; align-items:center; justify-content:center; gap:2px;
+    font-size:12px; color:var(--muted); transition:all .3s; }
+  .gpu-badge .ic{ font-size:22px; }
+  .gpu-badge.busy{ border-color:var(--blue); color:#cfe6ff; box-shadow:0 0 18px rgba(88,166,255,.45);
+    background:linear-gradient(180deg,#143055,#102844); animation:gpupulse 1s infinite; }
+  @keyframes gpupulse{ 0%,100%{box-shadow:0 0 12px rgba(88,166,255,.3);} 50%{box-shadow:0 0 22px rgba(88,166,255,.6);} }
+  .l3box{ flex:1; border:1px solid var(--line); border-radius:12px; background:#0d1117; padding:8px 12px; transition:all .3s; }
+  .l3box.hot{ border-color:var(--green); box-shadow:0 0 14px rgba(63,185,80,.25); }
+  .l3title{ font-size:12px; color:var(--muted); margin-bottom:6px; display:flex; justify-content:space-between; align-items:center; }
+  .l3title .badge{ font-size:10px; padding:1px 8px; border-radius:999px; border:1px solid var(--line); color:var(--muted); min-width:46px; text-align:center; }
+  .l3title .badge.miss{ border-color:var(--red); color:#ffd4d0; }
+  .l3title .badge.hit{ border-color:var(--green); color:#c4f7d4; }
+  .pagerow{ display:flex; gap:6px; flex-wrap:wrap; min-height:28px; }
+  .pg{ width:38px; height:26px; border-radius:6px; border:1px solid var(--line); background:#11161f;
+    display:flex; align-items:center; justify-content:center; font-size:11px; color:#5b6470;
+    transition:all .3s; opacity:0; transform:scale(.6); }
+  .pg.show{ opacity:1; transform:none; }
+  .pg.l3{ border-color:var(--green); color:#c4f7d4; background:linear-gradient(180deg,#0f3a1d,#0e2c18); }
+
+  #scene5 .ranks{ display:flex; flex-direction:column; gap:10px; }
+  .ranklane{ border:1px solid var(--line); border-radius:12px; padding:8px 12px; background:var(--panel2); transition:all .3s; }
+  .ranklane.active{ border-color:var(--blue); box-shadow:0 0 12px rgba(88,166,255,.22); }
+  .rankhdr{ display:flex; align-items:center; gap:10px; margin-bottom:6px; font-size:12px; }
+  .rankhdr .rname{ font-weight:700; color:var(--text); }
+  .rankhdr .tps{ display:flex; gap:4px; }
+  .rankhdr .tp{ width:8px; height:8px; border-radius:50%; background:#11161f; border:1px solid var(--line); transition:all .25s; }
+  .rankhdr .tp.on{ background:var(--blue); border-color:var(--blue); box-shadow:0 0 6px var(--blue); }
+  .rankhdr .rstat{ margin-left:auto; font-size:11px; padding:1px 9px; border-radius:999px; border:1px solid var(--line); color:var(--muted); }
+  .rankhdr .rstat.miss{ border-color:var(--red); color:#ffd4d0; }
+  .rankhdr .rstat.hit{ border-color:var(--green); color:#c4f7d4; }
+  .rankhdr .rstat.warn{ border-color:var(--amber); color:#ffe2ab; }
+  .htree{ display:flex; align-items:center; gap:8px; min-height:28px; }
+  .htree .root{ font-size:10px; color:var(--muted); padding:2px 7px; border:1px dashed var(--line); border-radius:6px; }
+  .htnode{ width:40px; height:26px; border-radius:6px; border:1px solid var(--line); background:#11161f;
+    display:flex; align-items:center; justify-content:center; font-size:11px; color:#5b6470; position:relative;
+    transition:all .35s; opacity:0; transform:translateY(-6px) scale(.7); }
+  .htnode::before{ content:""; position:absolute; left:-8px; top:50%; width:8px; height:1px; background:var(--line); }
+  .htnode:first-of-type::before{ display:none; }
+  .htnode.show{ opacity:1; transform:none; }
+  .htnode.committed{ border-color:var(--green); color:#c4f7d4; background:linear-gradient(180deg,#0f3a1d,#0e2c18); }
+  .htnode.inserting{ border-color:var(--blue); color:#cfe6ff; background:linear-gradient(180deg,#143055,#102844); }
+  .htnode.matched{ border-color:var(--cyan); color:#cdf6fa; box-shadow:0 0 8px rgba(86,212,221,.4); }
+  .htnode.warn{ border-color:var(--amber); color:#ffe2ab; background:#1d1808; }
+  .htnode.evict{ border-color:var(--red); color:#ffd4d0; background:linear-gradient(180deg,#3a1414,#2a1010); }
+  .story-sync{ display:flex; gap:14px; justify-content:center; margin:14px 0 6px; flex-wrap:wrap; }
+  .syncbadge{ font-size:12px; padding:6px 14px; border-radius:999px; border:1.5px solid var(--line); color:var(--muted); transition:all .3s; }
+  .syncbadge.g1.fire{ border-color:var(--amber); color:#ffe2ab; background:#1d1808; box-shadow:0 0 16px rgba(210,153,34,.45); transform:scale(1.04); }
+  .syncbadge.g2.fire{ border-color:var(--green); color:#c4f7d4; background:#0f2a18; box-shadow:0 0 16px rgba(63,185,80,.45); transform:scale(1.04); }
+  .consist-flag{ text-align:center; font-weight:700; font-size:14px; min-height:20px; margin-top:4px; }
+  .consist-flag.ok{ color:var(--green); } .consist-flag.bad{ color:var(--red); }
+
+  /* ---------- tab6: PrefetchAck alignment & anti-hang ---------- */
+  #scene6 .ackmesh{ display:flex; flex-direction:column; gap:8px; margin-bottom:6px; }
+  .ackrow{ display:flex; align-items:center; gap:10px; border:1px solid var(--line); border-radius:10px; padding:6px 10px; background:var(--panel2); transition:all .3s; }
+  .ackrow.blocked{ border-color:var(--red); box-shadow:0 0 12px rgba(248,81,73,.3); }
+  .acklabel{ width:190px; font-size:12px; color:var(--muted); }
+  .acklabel b{ color:var(--text); }
+  .ackslots{ flex:1; display:flex; gap:8px; }
+  .ackchip{ flex:1; height:34px; border-radius:8px; border:1px solid var(--line); background:#0d1117;
+    display:flex; align-items:center; justify-content:center; font-size:11px; color:#5b6470; gap:5px;
+    transition:all .3s; position:relative; }
+  .ackchip .err{ font-size:9px; color:var(--red); }
+  .ackchip.pending{ opacity:.4; }
+  .ackchip.emit{ border-color:var(--blue); color:#cfe6ff; background:linear-gradient(180deg,#143055,#102844); box-shadow:0 0 10px rgba(88,166,255,.4); }
+  .ackchip.passed{ border-color:var(--green); color:#c4f7d4; background:linear-gradient(180deg,#0f3a1d,#0e2c18); }
+  .ackchip.wait{ border-color:var(--amber); color:#ffe2ab; background:#1d1808; animation:pulse 1s infinite; }
+  .ackchip.missing{ border-color:var(--red); border-style:dashed; color:#ffd4d0; background:#2a1010; }
+  .barriers{ border:1px dashed var(--line); border-radius:10px; padding:8px 10px; margin-top:4px; }
+  .barlabel{ font-size:12px; color:var(--muted); margin-bottom:6px; text-align:center; }
+  .barrow{ display:flex; gap:8px; align-items:stretch; }
+  .barrow .barspacer{ width:190px; }
+  .barcols{ flex:1; display:flex; gap:8px; }
+  .bar{ flex:1; border:1.5px solid var(--line); border-radius:8px; padding:6px 4px; text-align:center;
+    font-size:11px; color:var(--muted); transition:all .3s; }
+  .bar .bcount{ display:block; font-size:13px; font-weight:800; margin-top:2px; color:#5b6470; }
+  .bar.waiting{ border-color:var(--amber); color:#ffe2ab; background:#1d1808; animation:pulse 1s infinite; }
+  .bar.waiting .bcount{ color:#ffe2ab; }
+  .bar.fired{ border-color:var(--green); color:#c4f7d4; background:#0f2a18; }
+  .bar.fired .bcount{ color:#c4f7d4; }
+  .bar.dead{ border-color:var(--red); color:#ffd4d0; background:#2a1010; box-shadow:0 0 12px rgba(248,81,73,.4); }
+  .bar.dead .bcount{ color:#ffd4d0; }
+</style>
+</head>
+<body>
+<button id="langBtn" onclick="toggleLang()">EN</button>
+<header>
+  <h1>HiCache × Pipeline Parallel：树一致性 & 防死锁</h1>
+  <p>拓扑 <b>PP=3 × TP=8 = 24 ranks</b> · 行=TP 组、列=PP 组 · MIN all-reduce 保证 radix tree 一致 · 2 套 gloo 组避免后台 collective 死锁</p>
+</header>
+
+<div class="tabs">
+  <button class="tab active" data-tab="story">① 两请求全流程（L3 命中/未命中 · host tree 一致）</button>
+  <button class="tab" data-tab="consistency">② 树一致性（自动播放）</button>
+  <button class="tab" data-tab="deadlock">③ 为什么 2 个组不死锁</button>
+  <button class="tab" data-tab="skew">④ 异步时间差 × MIN 统一步调</button>
+  <button class="tab" data-tab="threads">⑤ 线程关系 & 树一致性</button>
+  <button class="tab" data-tab="ackalign">⑥ PrefetchAck 对齐 & 防 hang</button>
+</div>
+
+<div class="wrap">
+  <!-- ============ TAB 5 (story, shown first) ============ -->
+  <div class="scene" id="scene5">
+    <div class="legend">
+      <span class="chip"><span class="sw" style="background:var(--blue)"></span>GPU 计算 / 插入中</span>
+      <span class="chip"><span class="sw" style="background:var(--cyan)"></span>match 命中前缀</span>
+      <span class="chip"><span class="sw" style="background:var(--amber)"></span>各 rank 不一致（待 MIN 统一）</span>
+      <span class="chip"><span class="sw" style="background:var(--green)"></span>已提交 / 一致</span>
+      <span class="chip"><span class="sw" style="background:var(--red)"></span>未命中 / 淘汰删除</span>
+    </div>
+    <div class="story-top">
+      <div class="gpu-badge" id="gpuBadge"><span class="ic">▣</span><span>GPU 计算</span></div>
+      <div class="l3box" id="l3box">
+        <div class="l3title"><span class="l3lab"><b>L3 持久化存储</b>（storage backend，3 个 rank 共享视图）</span><span class="badge" id="l3badge"></span></div>
+        <div class="pagerow" id="l3pages"></div>
+      </div>
+    </div>
+    <div class="ranks" id="ranks"></div>
+    <div class="story-sync">
+      <div class="syncbadge g1" id="s5sync1">◆ MIN 组1 · prefetch_hits_sync_groups · storage_hit_count</div>
+      <div class="syncbadge g2" id="s5sync2">◆ MIN 组2 · prefetch_completion_sync_groups · completed_tokens</div>
+    </div>
+    <div class="consist-flag" id="s5flag"></div>
+    <div class="caption" id="cap5">自动播放中…</div>
+  </div>
+  <div class="controls" id="ctl5">
+    <button class="ctl primary" id="play5">⏸ 暂停</button>
+    <button class="ctl" id="replay5">⟲ 重播</button>
+  </div>
+
+  <!-- ============ TAB 1 ============ -->
+  <div class="scene hidden" id="scene1">
+    <div class="legend">
+      <span class="chip"><span class="sw" style="background:var(--amber)"></span>命中数被截断（不一致）</span>
+      <span class="chip"><span class="sw" style="background:var(--blue)"></span>TP 组内 MIN 后</span>
+      <span class="chip"><span class="sw" style="background:var(--green)"></span>PP 组内 MIN 后（全局一致）</span>
+    </div>
+    <div id="mesh1"></div>
+    <div class="tree-box">
+      <div class="tree-title" id="treeTitle">所有 24 个 rank 共享同一棵 radix tree</div>
+      <div class="tree" id="sharedTree"></div>
+    </div>
+    <div class="caption" id="cap1">自动播放中…</div>
+  </div>
+  <div class="controls">
+    <button class="ctl primary" id="play1">⏸ 暂停</button>
+    <button class="ctl" id="replay1">⟲ 重播</button>
+  </div>
+
+  <!-- ============ TAB 2 ============ -->
+  <div class="scene hidden" id="scene2">
+    <div class="legend">
+      <span class="chip"><span class="sw" style="background:var(--purple)"></span><b>prefetch_thread</b>（独立后台线程）· reduce(storage_hit_count)</span>
+      <span class="chip"><span class="sw" style="background:var(--cyan)"></span><b>prefetch_sync_thread</b>（独立后台线程）· reduce(completed_tokens)</span>
+    </div>
+    <div class="note" style="margin-bottom:8px;">每个 cell = 1 个 rank，内含 2 个独立后台线程（小圆点 ●A ●B）。每一行是一个 <b>TP communicator</b>，每一列是一个 <b>PP communicator</b>。</div>
+    <div id="mesh2"></div>
+    <div class="groups">
+      <div class="grp g1" id="g1"><b>prefetch_hits_sync_groups</b><br>命中页数归约组（含 TP 环 + PP 环）<br><span style="font-size:11px">reduce(storage_hit_count)</span></div>
+      <div class="grp g2" id="g2"><b>prefetch_completion_sync_groups</b><br>完成 token 归约组（含 TP 环 + PP 环）<br><span style="font-size:11px">reduce(completed_tokens)</span></div>
+    </div>
+    <div class="banner" id="banner2"></div>
+    <div class="caption" id="cap2">选择场景：用 <b>1 套组</b> 会死锁，用 <b>2 套组</b> 则安全。</div>
+  </div>
+  <div class="controls" id="ctl2">
+    <button class="ctl alt" id="play1grp">▶ 1 套组（死锁）</button>
+    <button class="ctl primary" id="play2grp">▶ 2 套组（安全）</button>
+    <button class="ctl" id="reset2">重置</button>
+  </div>
+
+  <!-- ============ TAB 3 ============ -->
+  <div class="scene hidden" id="scene3">
+    <!-- layer 3: pipeline timing (top) -->
+    <div class="t3-section">
+      <div class="t3-title">③ 主 PP 流水线执行<strong>时序</strong> <span class="tag gpu">NCCL · GPU</span>
+        <span style="color:var(--muted);font-size:11px;">时序连续、错峰流动，<strong style="color:var(--green)">不被后台 prefetch 同步打断</strong></span>
+      </div>
+      <div class="pipe" id="pipe">
+        <div class="lane s0"><span class="lname">PP stage 0</span><span class="lane-hint" id="hint0"></span></div>
+        <div class="lane s1"><span class="lname">PP stage 1</span><span class="lane-hint" id="hint1"></span></div>
+        <div class="lane s2"><span class="lname">PP stage 2</span><span class="lane-hint" id="hint2"></span></div>
+      </div>
+      <div class="flow-note">↑ 流水线跑的正是②组好的 <strong>mb0→mb3</strong>，沿 stage0→1→2 错峰对角推进</div>
+    </div>
+
+    <div class="t3-conduit" id="arrow1">
+      <span class="clabel">▲ 组好的 <b>batch &amp; micro-batch 顺序</b> 喂给流水线（内容）</span>
+      <span class="spark"></span><span class="spark s2"></span>
+    </div>
+
+    <!-- layer 2: batch former (middle) -->
+    <div class="t3-section">
+      <div class="t3-title">② 三个 PP rank 用<strong>同一个 storage hit</strong> 组 batch（内容必须逐 rank 一致）</div>
+      <div class="formers" id="formers"></div>
+    </div>
+
+    <div class="t3-conduit" id="arrow2">
+      <span class="clabel">▲ <code>all_reduce(MIN)</code> 输出统一值 <b>6</b> → 决定 batch size</span>
+      <span class="spark"></span><span class="spark s2"></span>
+    </div>
+
+    <!-- layer 1: async prefetch + MIN (bottom = causal source) -->
+    <div class="t3-section">
+      <div class="t3-title">① 异步 prefetch 查询 → <code>all_reduce(MIN)</code> <span class="tag cpu">gloo · CPU 后台线程</span></div>
+      <div class="sync-area" id="syncArea">
+        <div class="clock" id="t3clock">t = 0.0s</div>
+        <div class="barrier" id="barrier" style="left:62%"><span class="blabel">all_reduce(MIN)</span></div>
+        <div class="slabel" id="sl0">rank pp0</div><div class="pkt travel" id="pkt0"><span>pp0 storage hit</span><span class="hv">8</span></div>
+        <div class="slabel" id="sl1">rank pp1</div><div class="pkt travel" id="pkt1"><span>pp1 storage hit</span><span class="hv">6</span></div>
+        <div class="slabel" id="sl2">rank pp2</div><div class="pkt travel" id="pkt2"><span>pp2 storage hit</span><span class="hv">7</span></div>
+      </div>
+    </div>
+    <div class="caption" id="cap3">自动播放中…</div>
+  </div>
+  <div class="controls" id="ctl3">
+    <button class="ctl primary" id="play3">⏸ 暂停</button>
+    <button class="ctl" id="replay3">⟲ 重播</button>
+  </div>
+
+  <!-- ============ TAB 4 ============ -->
+  <div class="scene hidden" id="scene4">
+    <div class="t4wrap">
+      <!-- left: thread data-flow -->
+      <div class="t4flow">
+        <div class="tbox sched" id="t4b0">
+          <div class="tname">调度器 Scheduler <span class="pin">主线程</span></div>
+          <div class="tdesc">发起 prefetch 请求（writeback / load）</div>
+        </div>
+        <div class="tarrow" id="t4a0">▼ <b>prefetch_queue</b>（PrefetchOperation）</div>
+
+        <div class="tbox thread-hit" id="t4b1">
+          <div class="tname">① prefetch_thread <span class="pin">storage-hit 线程</span></div>
+          <div class="tdesc"><code>_storage_hit_query()</code> 查询 L3 命中页数；命中足够→放 prefetch_buffer，不足→prefetch_revoke_queue</div>
+        </div>
+        <div class="minnode" id="t4m1">◆ all_reduce(MIN) storage_hit_count
+          <small>@ prefetch_hits_sync_groups（组1，gloo/CPU，含 TP 环 + PP 环）</small></div>
+        <div class="tarrow" id="t4a1">▼ <b>prefetch_buffer</b></div>
+
+        <div class="tbox thread-io" id="t4b2">
+          <div class="tname">② prefetch_io_aux_thread <span class="pin">IO 加载线程</span></div>
+          <div class="tdesc"><code>_page_transfer()</code> 逐 batch 把页从 L3 读入 host；累加 <b>completed_tokens</b>；<b>每个 storage batch 产生 1 个 PrefetchAck</b>（出错也照常产生）</div>
+        </div>
+        <div class="tarrow" id="t4a2">▼ <b>prefetch_sync_queue</b>（PrefetchAck）</div>
+
+        <div class="tbox thread-sync" id="t4b3">
+          <div class="tname">③ prefetch_sync_thread <span class="pin">completion-token 线程</span></div>
+          <div class="tdesc">对每个 ack 的 <b>completed_tokens</b> 做归约</div>
+        </div>
+        <div class="minnode g2" id="t4m2">◆ all_reduce(MIN) completed_tokens
+          <small>@ prefetch_completion_sync_groups（组2，gloo/CPU，含 TP 环 + PP 环）</small></div>
+        <div class="tarrow" id="t4a3">▼ <b>ack_prefetch_queue</b></div>
+
+        <div class="tbox sched" id="t4b4">
+          <div class="tname">调度器写入 host radix tree</div>
+          <div class="tdesc">只插入 <b>completed_tokens</b> 长度的前缀 → <code>_insert_helper_host()</code></div>
+        </div>
+      </div>
+
+      <!-- right: why consistent -->
+      <div class="t4why">
+        <div class="whycard a" id="t4wa">
+          <h4>为什么 MIN(storage_hit) 一致？</h4>
+          <p>各 rank 命中可能不同（host 内存截断、L3 视图差异）。MIN 取<b>最长公共可命中前缀</b> → 所有 rank <b>抓取范围一致</b>，不会各抓不同长度。</p>
+        </div>
+        <div class="whycard b" id="t4wb">
+          <h4>为什么 MIN(completed_tokens) 一致？</h4>
+          <p>即便抓取范围一致，实际逐页加载仍可能<b>部分失败</b>（<code>page_get</code> 返回 n≠batch）。MIN 只提交<b>所有 rank 都成功落盘的最长公共前缀</b> → 写入 host tree 的长度逐 rank 相同。</p>
+        </div>
+        <div class="whycard c" id="t4wc">
+          <h4>为什么不会 hang？</h4>
+          <p>每个 storage batch <b>都产生且仅产生一个 PrefetchAck</b>（即使出错也照常产生）→ 每个 rank 参与的 reduce <b>次数严格相等</b>，collective 一一对齐。两个 MIN 一起保证：<b>插入 host tree 的前缀逐 rank 完全相同 → 树一致</b>。</p>
+        </div>
+      </div>
+    </div>
+    <div class="caption" id="cap4">两个 MIN 同步点（组1 命中数、组2 完成数）+ 每 batch 恒定 1 个 ack，共同保证 PP 各 rank 的 host radix tree 严格一致。</div>
+  </div>
+  <div class="controls" id="ctl4">
+    <button class="ctl primary" id="play4">⏸ 暂停</button>
+    <button class="ctl" id="replay4">⟲ 重播</button>
+  </div>
+
+  <!-- ============ TAB 6 : PrefetchAck alignment & anti-hang ============ -->
+  <div class="scene hidden" id="scene6">
+    <div class="note" style="margin-bottom:10px;">每个 <b>storage batch</b> 在 <code>_page_transfer</code> 里恒产 <b>1 个 PrefetchAck</b>；<code>prefetch_sync_thread</code> 对<strong>每个 ack</strong> 在组2 做一次 <code>all_reduce(MIN)</code>。所以 <b>ack 数 = batch 数 = 组2 collective 次数</b>，必须逐 rank 相等。</div>
+    <div class="legend">
+      <span class="chip"><span class="sw" style="background:var(--blue)"></span>ack 已产出（参与本轮 reduce）</span>
+      <span class="chip"><span class="sw" style="background:var(--green)"></span>barrier 凑齐 3/3 → 通过</span>
+      <span class="chip"><span class="sw" style="background:var(--amber)"></span>已到达，等待缺席方</span>
+      <span class="chip"><span class="sw" style="background:var(--red)"></span>缺失 ack → 永远等不到</span>
+    </div>
+    <div class="ackmesh" id="ackmesh"></div>
+    <div class="barriers">
+      <div class="barlabel">◆ <code>all_reduce(MIN)</code> @ 组2（prefetch_completion_sync_groups）· 每个 ack 一次 barrier</div>
+      <div class="barrow"><div class="barspacer"></div><div class="barcols" id="barcols"></div></div>
+    </div>
+    <div class="banner" id="banner6"></div>
+    <div class="caption" id="cap6">选择场景：<b>每 batch 恒 1 ack</b> → 次数对齐、安全；<b>出错就 break</b> → ack 缺一个 → 组2 reduce 错位 → hang。</div>
+  </div>
+  <div class="controls" id="ctl6">
+    <button class="ctl primary" id="play6good">▶ 正确（每 batch 恒 1 ack）</button>
+    <button class="ctl alt" id="play6bad">▶ 错误（出错 break → ack 缺失）</button>
+    <button class="ctl" id="reset6">重置</button>
+  </div>
+</div>
+
+<script>
+const PP=3, TP=8;
+// initial hit counts per (pp,tp). some truncated by host-mem pressure.
+const HITS=[
+  [8,8,8,8,8,8,8,8],
+  [8,8,6,8,8,8,8,7],   // stage1: rank(1,2)=6, rank(1,7)=7 truncated
+  [8,8,8,8,8,8,8,8],
+];
+const ROWMIN = HITS.map(r=>Math.min(...r));      // [8,6,8]
+const GMIN = Math.min(...ROWMIN);                // 6
+const sleep=ms=>new Promise(r=>setTimeout(r,ms));
+
+/* ---------- build a mesh ---------- */
+function buildMesh(containerId, withThreads){
+  const c=document.getElementById(containerId);
+  let html='<div class="mesh-head"><div class="tp-hdr">';
+  for(let t=0;t<TP;t++) html+=`<div class="th">TP ${t}</div>`;
+  html+='</div></div>';
+  for(let p=0;p<PP;p++){
+    html+=`<div class="pp-row"><div class="pp-label"><b>PP stage ${p}</b><br>(TP 组)</div><div class="row-cells" id="${containerId}-row${p}">`;
+    for(let t=0;t<TP;t++){
+      html+=`<div class="cell" id="${containerId}-c${p}-${t}">
+        <div class="v">—</div>
+        <div class="rk">r${p*TP+t}</div>
+        ${withThreads?`<div class="tdots"><span class="td a" id="${containerId}-A-${p}-${t}"></span><span class="td b" id="${containerId}-B-${p}-${t}"></span></div>`:''}
+      </div>`;
+    }
+    html+='</div></div>';
+  }
+  // pp-group footer (columns)
+  html+='<div class="pp-foot"><div class="lab">PP 组(列)<br>每列跨 3 个 stage →</div><div class="cols">';
+  for(let t=0;t<TP;t++) html+=`<div class="col" id="${containerId}-col${t}">r${t}·r${t+TP}·r${t+2*TP}</div>`;
+  html+='</div></div>';
+  c.innerHTML=html;
+}
+const cell=(m,p,t)=>document.getElementById(`${m}-c${p}-${t}`);
+const val =(m,p,t)=>cell(m,p,t).querySelector('.v');
+
+buildMesh('mesh1',false);
+buildMesh('mesh2',true);
+
+/* ============================================================
+   TAB 1 : auto-play loop
+   query -> diverge -> TP all_reduce(MIN) -> PP all_reduce(MIN) -> consistent tree
+   ============================================================ */
+let t1Token=0, t1Paused=false;
+const cap1=document.getElementById('cap1');
+const sharedTree=document.getElementById('sharedTree');
+
+function resetMesh1(){
+  for(let p=0;p<PP;p++) for(let t=0;t<TP;t++){
+    const cl=cell('mesh1',p,t); cl.className='cell';
+    val('mesh1',p,t).innerHTML='—';
+  }
+  sharedTree.innerHTML='';
+}
+async function gate(my){ while(t1Paused){ await sleep(120); if(my!==t1Token) throw 0; } }
+async function step(ms,my){ await sleep(ms); await gate(my); if(my!==t1Token) throw 0; }
+
+async function runTab1(){
+  const my=++t1Token;
+  try{
+    while(true){
+      resetMesh1();
+      cap1.innerHTML='拓扑 <b>PP=3 × TP=8 = 24 个 rank</b>：每个 PP stage 下挂 8 个 TP rank。';
+      await step(1600,my);
+
+      // 1) independent query
+      for(let p=0;p<PP;p++) for(let t=0;t<TP;t++){
+        const v=HITS[p][t]; const cl=cell('mesh1',p,t);
+        val('mesh1',p,t).innerHTML=v+' <small>pg</small>';
+        if(v!==8) cl.classList.add('varied');
+        await sleep(35);
+      }
+      cap1.innerHTML='① 各 rank <span class="k">独立</span>向 L3 查询前缀命中。<b style="color:var(--amber)">注意 r10、r15 因 host 内存压力被截断</b>（6 / 7 页）。';
+      await step(2200,my);
+
+      // 2) diverge warning
+      cap1.innerHTML='② 若各 rank 按自己的命中数建 radix tree → 树高不一致 → 后续 PP 集合通信 <b style="color:var(--red)">shape mismatch → crash</b>。';
+      for(let p=0;p<PP;p++) for(let t=0;t<TP;t++) if(HITS[p][t]!==8){ cell('mesh1',p,t).classList.add('bad'); }
+      await step(2200,my);
+      for(let p=0;p<PP;p++) for(let t=0;t<TP;t++) cell('mesh1',p,t).classList.remove('bad','varied');
+
+      // 3) TP all_reduce(MIN) — sweep each row (all rows in parallel)
+      cap1.innerHTML='③ 第一步：在 <span class="k">TP 组（每一行 8 个 rank）</span>内 <code>all_reduce(MIN)</code>。';
+      for(let t=0;t<TP;t++){
+        for(let p=0;p<PP;p++) cell('mesh1',p,t).classList.add('sweep');
+        await step(110,my);
+        for(let p=0;p<PP;p++) cell('mesh1',p,t).classList.remove('sweep');
+      }
+      for(let p=0;p<PP;p++) for(let t=0;t<TP;t++){
+        const cl=cell('mesh1',p,t); cl.classList.add('tpmin');
+        val('mesh1',p,t).innerHTML=ROWMIN[p]+' <small>pg</small>';
+      }
+      cap1.innerHTML='③ TP 组归约后：<b>每一行变得一致</b>（PP0=8, PP1=6, PP2=8 = 各行最小值）。';
+      await step(1900,my);
+
+      // 4) PP all_reduce(MIN) — sweep each column (top->bottom)
+      cap1.innerHTML='④ 第二步：在 <span class="k">PP 组（每一列 3 个 rank）</span>内 <code>all_reduce(MIN)</code> → 收敛到全局最小值。';
+      for(let p=0;p<PP;p++){
+        for(let t=0;t<TP;t++) cell('mesh1',p,t).classList.add('sweep');
+        await step(180,my);
+        for(let t=0;t<TP;t++) cell('mesh1',p,t).classList.remove('sweep');
+      }
+      for(let p=0;p<PP;p++) for(let t=0;t<TP;t++){
+        const cl=cell('mesh1',p,t); cl.classList.remove('tpmin'); cl.classList.add('gmin');
+        val('mesh1',p,t).innerHTML=GMIN+' <small>pg</small>';
+      }
+      cap1.innerHTML='④ PP 组归约后：<b style="color:var(--green)">全部 24 个 rank 命中数 = 6</b>（最长公共前缀）。';
+      await step(1700,my);
+
+      // 5) shared consistent tree
+      cap1.innerHTML='⑤ 所有 rank 都只 prefetch / 建树到 6 → <span style="color:var(--green)">24 个 rank 的 radix tree 完全一致 ✓</span>';
+      sharedTree.innerHTML='';
+      for(let i=0;i<GMIN;i++){
+        const n=document.createElement('div'); n.className='tnode'; n.textContent='page '+i; sharedTree.appendChild(n);
+        await step(120,my); n.classList.add('show');
+      }
+      await step(2600,my);
+    }
+  }catch(e){ /* cancelled */ }
+}
+
+document.getElementById('play1').onclick=function(){
+  t1Paused=!t1Paused;
+  this.textContent=t1Paused?'▶ 播放':'⏸ 暂停';
+};
+document.getElementById('replay1').onclick=()=>{ t1Paused=false; document.getElementById('play1').textContent='⏸ 暂停'; runTab1(); };
+
+/* ============================================================
+   TAB 2 : why 2 group-sets avoid deadlock (PP×TP mesh)
+   ============================================================ */
+let t2Token=0;
+const cap2=document.getElementById('cap2');
+const banner2=document.getElementById('banner2');
+const g1=document.getElementById('g1'), g2=document.getElementById('g2');
+const row2=p=>document.getElementById('mesh2-row'+p);
+const col2=t=>document.getElementById('mesh2-col'+t);
+const dotEl=(op,p,t)=>document.getElementById(`mesh2-${op}-${p}-${t}`);
+
+function resetMesh2(){
+  ++t2Token;
+  for(let p=0;p<PP;p++){
+    row2(p).className='row-cells';
+    for(let t=0;t<TP;t++){
+      const cl=cell('mesh2',p,t); cl.className='cell';
+      val('mesh2',p,t).innerHTML=GMIN+' <small>pg</small>';
+      dotEl('A',p,t).className='td a'; dotEl('B',p,t).className='td b';
+    }
+  }
+  for(let t=0;t<TP;t++) col2(t).className='col';
+  g1.classList.remove('hot'); g2.classList.remove('hot'); g2.style.opacity=1;
+  banner2.className='banner'; banner2.textContent='';
+}
+async function s2(ms,my){ await sleep(ms); if(my!==t2Token) throw 0; }
+
+/* ---- 1 shared group set -> deadlock ---- */
+async function play1Group(){
+  resetMesh2(); const my=t2Token;
+  try{
+    g2.style.opacity=.25; g1.classList.add('hot');
+    cap2.innerHTML='只有 <b>1 套组</b>：prefetch_thread(A) 与 prefetch_sync_thread(B) 共用同一个 communicator 集。';
+    await s2(800,my);
+
+    // each rank's two threads race: some submit A first (purple), some B first (cyan)
+    cap2.innerHTML='两个后台线程<b>独立调度、顺序不定</b>：同一个 TP 环里，有的 rank 先发 A，有的先发 B。';
+    for(let p=0;p<PP;p++){
+      for(let t=0;t<TP;t++){
+        const aFirst=((p+t)%2===0);      // deterministic but mixed within each row
+        const first=aFirst?'A':'B';
+        dotEl(first,p,t).classList.add('on');
+      }
+    }
+    await s2(1200,my);
+
+    // rings cannot align -> red
+    cap2.innerHTML='同一个 communicator 上各 rank 提交的 collective <b style="color:var(--red)">不是同一个</b>（A 与 B 错位）→ rendezvous 永远配不上。';
+    for(let p=0;p<PP;p++){
+      row2(p).classList.add('ring-bad');
+      for(let t=0;t<TP;t++){
+        cell('mesh2',p,t).classList.add('bad');
+        const aFirst=((p+t)%2===0);
+        dotEl(aFirst?'A':'B',p,t).className=`td ${aFirst?'a':'b'} dead`;
+      }
+    }
+    for(let t=0;t<TP;t++) col2(t).classList.add('ring-bad');
+    await s2(700,my);
+    banner2.className='banner bad'; banner2.textContent='💥 DEADLOCK — 整个 24-rank job 卡死';
+    cap2.innerHTML='只要任一 communicator 上 A/B 交错，该环就死锁 → 全局 PP/TP 通信连环卡住。';
+  }catch(e){}
+}
+
+/* ---- 2 group sets -> safe ---- */
+async function play2Groups(){
+  resetMesh2(); const my=t2Token;
+  try{
+    g1.classList.add('hot'); g2.classList.add('hot');
+    cap2.innerHTML='用 <b>2 套独立组</b>：<b style="color:var(--purple)">A 永远走 prefetch_hits_sync_groups</b>，<b style="color:var(--cyan)">B 永远走 prefetch_completion_sync_groups</b>。';
+    await s2(800,my);
+
+    // wave A: every rank's prefetch_thread submits A to group-set-1; TP rings + PP rings light purple
+    cap2.innerHTML='第一波：所有 rank 的 <b>prefetch_thread</b> 只在 <code>prefetch_hits_sync_groups</code> 上提交 A → 序列一致。';
+    for(let p=0;p<PP;p++){
+      row2(p).classList.add('ring-a');
+      for(let t=0;t<TP;t++) dotEl('A',p,t).classList.add('on');
+    }
+    for(let t=0;t<TP;t++) col2(t).classList.add('ring-a');
+    await s2(1100,my);
+    for(let p=0;p<PP;p++){
+      row2(p).className='row-cells';
+      for(let t=0;t<TP;t++) dotEl('A',p,t).className='td a done';
+    }
+    for(let t=0;t<TP;t++) col2(t).className='col';
+    cap2.innerHTML='✓ TP 环 + PP 环上 A 全部到齐 → 第一波归约完成。';
+    await s2(900,my);
+
+    // wave B: prefetch_sync_thread submits B to group-set-2; rings light cyan
+    cap2.innerHTML='第二波：所有 rank 的 <b>prefetch_sync_thread</b> 只在 <code>prefetch_completion_sync_groups</code> 上提交 B → 序列一致。';
+    for(let p=0;p<PP;p++){
+      row2(p).classList.add('ring-b');
+      for(let t=0;t<TP;t++) dotEl('B',p,t).classList.add('on');
+    }
+    for(let t=0;t<TP;t++) col2(t).classList.add('ring-b');
+    await s2(1100,my);
+    for(let p=0;p<PP;p++){
+      row2(p).className='row-cells';
+      for(let t=0;t<TP;t++){ dotEl('B',p,t).className='td b done'; cell('mesh2',p,t).classList.add('gmin'); }
+    }
+    for(let t=0;t<TP;t++) col2(t).className='col';
+    banner2.className='banner ok'; banner2.textContent='✅ 安全 — 24 个 rank 全部对齐完成';
+    cap2.innerHTML='每个 communicator 上的 collective 序列在所有 rank <b style="color:var(--green)">完全一致</b>（A→组1、B→组2 不交叉）→ 不会死锁。';
+  }catch(e){}
+}
+
+document.getElementById('play1grp').onclick=play1Group;
+document.getElementById('play2grp').onclick=play2Groups;
+document.getElementById('reset2').onclick=()=>{ resetMesh2(); cap2.innerHTML='选择场景：用 <b>1 套组</b> 会死锁，用 <b>2 套组</b> 则安全。'; };
+
+/* ============================================================
+   TAB 3 : async PP skew  x  all_reduce(MIN) unifies pace
+   Top: continuous, skewed micro-batch pipeline (CSS infinite) — never pauses.
+   Bottom: time-driven (rAF). Each rank's prefetch op arrives at the MIN barrier
+   at a DIFFERENT wall-clock time (async skew). Early arrivals park & wait on the
+   gloo CPU group (background thread). When the slowest arrives, one MIN flash
+   unifies all three to 6 and they depart together — while the top pipeline keeps
+   flowing untouched.
+   ============================================================ */
+const NMB=4;                          // micro-batches per batch (illustrative)
+const MB_LABELS=Array.from({length:NMB},(_,k)=>'mb'+k);
+// build top pipeline micro-batches: one controllable block per (stage, mb)
+(function buildPipe(){
+  for(let s=0;s<3;s++){
+    const lane=document.querySelector('#pipe .s'+s);
+    for(let k=0;k<NMB;k++){
+      const mb=document.createElement('div');
+      mb.className='mb'; mb.id=`pmb-${s}-${k}`; mb.textContent=MB_LABELS[k];
+      lane.appendChild(mb);
+    }
+  }
+})();
+// build batch formers (one per PP rank): hit input + ordered mb chips
+(function buildFormers(){
+  const host=document.getElementById('formers');
+  for(let p=0;p<3;p++){
+    const f=document.createElement('div'); f.className='former'; f.id='former'+p;
+    let chips='';
+    for(let k=0;k<NMB;k++) chips+=`<div class="mbchip" id="fchip-${p}-${k}">${MB_LABELS[k]}</div>`;
+    f.innerHTML=`<h5>调度器 · PP rank ${p}</h5>
+      <div class="hitbox"><span class="ht1">已缓存前缀 storage hit = </span><b id="fhit${p}">？</b><span class="ht2"> 页 → 决定 batch 组成</span></div>
+      <div class="mbrow">${chips}</div>
+      <div class="chk" id="fchk${p}"></div>`;
+    host.appendChild(f);
+  }
+})();
+const arrow1=document.getElementById('arrow1');
+const arrow2=document.getElementById('arrow2');
+
+const PKT=[
+  { el:document.getElementById('pkt0'), lab:document.getElementById('sl0'), y:34,  hit:8, arrive:2.6 },
+  { el:document.getElementById('pkt1'), lab:document.getElementById('sl1'), y:85,  hit:6, arrive:1.9 }, // arrives first
+  { el:document.getElementById('pkt2'), lab:document.getElementById('sl2'), y:136, hit:7, arrive:3.9 }, // slowest -> everyone waits
+];
+const GMIN3=Math.min(...PKT.map(p=>p.hit)); // 6
+const T3={ START:0.4, SYNC:3.9, FLASH_END:4.5, DEPART_END:5.4,
+           DROP:4.5, MB_START:5.0, MB_STEP:0.35, READY:6.4, CYCLE:14.0,
+           PIPE_START:6.4, MB_LAG:1.0, STAGE_LAG:0.9, MB_TRAVEL:3.2,
+           X0:11, XB:62, X1:97 }; // seconds / percentages
+const cap3=document.getElementById('cap3');
+const barrier=document.getElementById('barrier');
+const t3clock=document.getElementById('t3clock');
+let t3raf=null, t3on=false, t3paused=false, t3start=0, t3lastT=0;
+
+function placePkt(p){ p.lab.style.top=p.y+'px'; p.el.style.top=p.y+'px'; }
+PKT.forEach(placePkt);
+function lerp(a,b,u){ return a+(b-a)*Math.max(0,Math.min(1,u)); }
+// returns progress 0..1 if t (or its wrapped form) is inside [st,en), else -1
+function pipeActive(st,en,t){
+  if(t>=st && t<en) return (t-st)/(en-st);
+  const tw=t+T3.CYCLE;
+  if(tw>=st && tw<en) return (tw-st)/(en-st);   // previous batch still draining across loop
+  return -1;
+}
+
+function resetFormers(){
+  arrow1.classList.remove('hot'); arrow2.classList.remove('hot');
+  for(let p=0;p<3;p++){
+    document.getElementById('fhit'+p).textContent='？';
+    document.getElementById('former'+p).classList.remove('ready');
+    document.getElementById('fchk'+p).textContent='';
+    for(let k=0;k<NMB;k++) document.getElementById(`fchip-${p}-${k}`).className='mbchip';
+  }
+}
+
+function t3frame(now){
+  if(!t3on){ return; }
+  if(!t3paused){
+    let t=((now - t3start)/1000) % T3.CYCLE;
+    t3lastT=t;
+    t3clock.textContent='t = '+t.toFixed(1)+'s';
+    barrier.classList.toggle('fire', (t>=T3.SYNC && t<T3.FLASH_END));
+
+    // --- layer 1: async packets toward MIN barrier ---
+    PKT.forEach(p=>{
+      let xPct, cls;
+      if(t < T3.START){ xPct=T3.X0; cls='travel'; }
+      else if(t < p.arrive){ xPct=lerp(T3.X0, T3.XB, (t-T3.START)/(p.arrive-T3.START)); cls='travel'; }
+      else if(t < T3.FLASH_END){ xPct=T3.XB; cls='wait'; }
+      else if(t < T3.DEPART_END){ xPct=lerp(T3.XB, T3.X1, (t-T3.FLASH_END)/(T3.DEPART_END-T3.FLASH_END)); cls='unified'; }
+      else { xPct=T3.X1; cls='unified'; }
+      p.el.style.left='calc('+xPct+'% - 54px)';
+      p.el.className='pkt '+cls;
+      p.el.querySelector('.hv').textContent = (t>=T3.SYNC? GMIN3 : p.hit);
+    });
+
+    // --- layer 2: batch formers driven by unified value ---
+    if(t < T3.FLASH_END){
+      resetFormers();
+    } else {
+      arrow2.classList.add('hot');                 // MIN -> value out
+      for(let p=0;p<3;p++) document.getElementById('fhit'+p).textContent=GMIN3;
+      // light mb chips in identical order across all three formers
+      let lit=0;
+      for(let k=0;k<NMB;k++){
+        if(t >= T3.MB_START + k*T3.MB_STEP){
+          for(let p=0;p<3;p++) document.getElementById(`fchip-${p}-${k}`).classList.add('on');
+          lit++;
+        }
+      }
+      if(t >= T3.READY){
+        arrow1.classList.add('hot');               // batch -> pipeline content
+        for(let p=0;p<3;p++){
+          document.getElementById('former'+p).classList.add('ready');
+          document.getElementById('fchk'+p).textContent='✓ batch & mb 顺序一致';
+          for(let k=0;k<NMB;k++) document.getElementById(`fchip-${p}-${k}`).className='mbchip fixed';
+        }
+      } else {
+        arrow1.classList.remove('hot');
+      }
+    }
+
+    // --- layer 3: the formed mb0..mb3 flow through stage0->1->2 (diagonal pipeline) ---
+    for(let s=0;s<3;s++){
+      for(let k=0;k<NMB;k++){
+        const b=document.getElementById(`pmb-${s}-${k}`);
+        const st=T3.PIPE_START + k*T3.MB_LAG + s*T3.STAGE_LAG;
+        const u=pipeActive(st, st+T3.MB_TRAVEL, t);   // handles cycle wrap (previous batch still draining)
+        if(u>=0){ b.style.left=(19+u*73)+'%'; b.style.opacity=1; }
+        else b.style.opacity=0;
+      }
+      document.getElementById('hint'+s).textContent = (t>1.4 && t<T3.PIPE_START) ? '（等待②组好的 batch…）' : '';
+    }
+
+    // --- captions ---
+    if(t < T3.START) cap3.innerHTML='① 三个 PP rank 的 prefetch 查询<strong>异步发起</strong>（到达时刻不同）。';
+    else if(t < PKT[2].arrive) cap3.innerHTML='① 先到的 rank 在 <span class="k">gloo CPU 后台线程</span>上<b style="color:var(--amber)">等待对齐</b>（不占 GPU）。';
+    else if(t < T3.FLASH_END) cap3.innerHTML='① <b style="color:var(--amber)">pp2 最慢</b>到达 → <code>all_reduce(MIN)</code> 把 8/6/7 <strong style="color:var(--green)">统一成 6</strong>。';
+    else if(t < T3.READY) cap3.innerHTML='② 统一后的 <b>storage hit = 6</b> 下发给各 rank 调度器 → 决定<strong>已缓存前缀长度 / batch size / micro-batch 顺序</strong>（mb0→mb3）。';
+    else cap3.innerHTML='③ 三个 rank 因拿到<strong style="color:var(--green)">同一个 6</strong> 而组出<strong style="color:var(--green)">完全一致的 batch 与 mb 顺序</strong>，喂给 PP 流水线；执行时序连续不被打断。<br><span style="color:var(--red)">⚠ 若 storage hit 不统一 → batch/mb 顺序逐 rank 发散 → PP 调度错位、卡死。</span>';
+  }
+  t3raf=requestAnimationFrame(t3frame);
+}
+function startTab3(restart){
+  t3on=true;
+  if(restart || !t3start){ t3start=performance.now(); t3paused=false; document.getElementById('play3').textContent='⏸ 暂停'; }
+  document.getElementById('pipe').classList.remove('paused');
+  if(!t3raf) t3raf=requestAnimationFrame(t3frame);
+}
+function stopTab3(){ t3on=false; if(t3raf){ cancelAnimationFrame(t3raf); t3raf=null; } }
+
+document.getElementById('play3').onclick=function(){
+  t3paused=!t3paused;
+  if(t3paused){ t3start=performance.now() - t3lastT*1000; } // freeze
+  else { t3start=performance.now() - t3lastT*1000; }        // resume from frozen t
+  this.textContent=t3paused?'▶ 播放':'⏸ 暂停';
+  document.getElementById('pipe').classList.toggle('paused', t3paused);
+};
+document.getElementById('replay3').onclick=()=>startTab3(true);
+
+/* ============================================================
+   TAB 4 : animated walk-through of the prefetch thread pipeline.
+   Highlights each stage in sequence; the chain "lights up" as the
+   data (PrefetchOperation → Ack → completed_tokens) flows down, and
+   the matching right-side why-card glows at each MIN sync point.
+   ============================================================ */
+let t4Token=0, t4Paused=false;
+const cap4El=document.getElementById('cap4');
+// [ids-to-light, caption]
+const T4SEQ=[
+  [['t4b0'], '调度器主线程把 prefetch 请求（writeback / load）放入队列，触发后台流水线。'],
+  [['t4a0'], '<b>PrefetchOperation</b> 入队 <code>prefetch_queue</code>，交给后台线程处理。'],
+  [['t4b1'], '① <b>prefetch_thread</b> 调 <code>_storage_hit_query()</code> 查询 L3 命中页数（各 rank 可能不同）。'],
+  [['t4m1','t4wa'], '◆ <b style="color:var(--amber)">第一个 MIN</b>：在 <code>prefetch_hits_sync_groups</code>（组1）对 storage_hit_count 取最小 → <b>抓取范围逐 rank 一致</b>。'],
+  [['t4a1'], '命中足够的请求落入 <code>prefetch_buffer</code>，进入实际 IO 加载。'],
+  [['t4b2'], '② <b>prefetch_io_aux_thread</b> 用 <code>_page_transfer()</code> 逐 batch 把页 L3→host；<b>每个 batch 恒产生 1 个 PrefetchAck</b>（出错也产生）。'],
+  [['t4a2'], '每个 batch 的 <b>PrefetchAck</b> 入队 <code>prefetch_sync_queue</code>。'],
+  [['t4b3'], '③ <b>prefetch_sync_thread</b> 对每个 ack 的 <b>completed_tokens</b> 做归约。'],
+  [['t4m2','t4wb'], '◆ <b style="color:var(--green)">第二个 MIN</b>：在 <code>prefetch_completion_sync_groups</code>（组2）对 completed_tokens 取最小 → <b>真正落盘前缀逐 rank 一致</b>。'],
+  [['t4a3'], '统一后的结果入队 <code>ack_prefetch_queue</code> 回到调度器。'],
+  [['t4b4','t4wc'], '调度器只插入 <b>completed_tokens</b> 长度的前缀 → <code>_insert_helper_host()</code>。每 batch 恒 1 个 ack，<b>reduce 次数严格相等 → 不会 hang</b>。'],
+];
+function t4clear(){
+  document.querySelectorAll('#scene4 .lit').forEach(e=>e.classList.remove('lit'));
+  document.querySelectorAll('#scene4 .dimmed').forEach(e=>e.classList.remove('dimmed'));
+}
+async function t4gate(my){ while(t4Paused){ await sleep(120); if(my!==t4Token) throw 0; } }
+async function t4step(ms,my){ await sleep(ms); await t4gate(my); if(my!==t4Token) throw 0; }
+async function runTab4(){
+  const my=++t4Token;
+  try{
+    while(true){
+      t4clear();
+      cap4El.innerHTML='沿数据流向下逐步点亮：两个 MIN 同步点 + 每 batch 恒定 1 个 ack。';
+      await t4step(1200,my);
+      for(const [ids,cap] of T4SEQ){
+        await t4gate(my);
+        ids.forEach(id=>document.getElementById(id).classList.add('lit'));
+        cap4El.innerHTML=cap;
+        await t4step(1900,my);
+      }
+      cap4El.innerHTML='✅ 闭环：两个 MIN（组1 命中数 + 组2 完成数）+ 每 batch 1 个 ack → <b style="color:var(--green)">PP 各 rank 的 host radix tree 严格一致</b>。';
+      await t4step(2600,my);
+    }
+  }catch(e){ /* cancelled */ }
+}
+function stopTab4(){ ++t4Token; }
+
+document.getElementById('play4').onclick=function(){
+  t4Paused=!t4Paused;
+  this.textContent=t4Paused?'▶ 播放':'⏸ 暂停';
+};
+document.getElementById('replay4').onclick=()=>{ t4Paused=false; document.getElementById('play4').textContent='⏸ 暂停'; runTab4(); };
+
+/* ============================================================
+   TAB 5 : two-request full lifecycle (PP=3 × TP=4).
+   Req A misses (GPU compute → L2 insert → L3 backup), then L2 is
+   evicted (delete, identical across ranks); Req B hits L3 and goes
+   through the two MIN syncs so every PP rank inserts the SAME prefix
+   into its host radix tree → trees stay consistent, no deadlock.
+   ============================================================ */
+const NPG=4;                       // pages tracked in the story
+const RANK_NAMES=['PP rank 0','PP rank 1','PP rank 2'];
+(function buildStory(){
+  let h='';
+  for(let p=0;p<3;p++){
+    let tps=''; for(let t=0;t<4;t++) tps+=`<span class="tp" id="s5tp-${p}-${t}"></span>`;
+    let nodes='<span class="root">host root</span>';
+    for(let i=0;i<NPG;i++) nodes+=`<div class="htnode" id="s5n-${p}-${i}">p${i}</div>`;
+    h+=`<div class="ranklane" id="s5lane${p}">
+      <div class="rankhdr"><span class="rname">${RANK_NAMES[p]}</span><span class="tps">${tps}</span>
+        <span class="rstat" id="s5stat${p}">idle</span></div>
+      <div class="htree">${nodes}</div></div>`;
+  }
+  document.getElementById('ranks').innerHTML=h;
+  let l3=''; for(let i=0;i<NPG;i++) l3+=`<div class="pg" id="s5l3-${i}">p${i}</div>`;
+  document.getElementById('l3pages').innerHTML=l3;
+})();
+
+let t5Token=0, t5Paused=false;
+const cap5=document.getElementById('cap5');
+const s5n=(p,i)=>document.getElementById(`s5n-${p}-${i}`);
+const setNode=(p,i,cls)=>{ s5n(p,i).className='htnode show '+cls; };
+const hideNode=(p,i)=>{ s5n(p,i).className='htnode'; };
+function rstat(p,txt,cls){ const e=document.getElementById('s5stat'+p); e.className='rstat '+(cls||''); e.textContent=txt; }
+function s5flag(txt,cls){ const e=document.getElementById('s5flag'); e.className='consist-flag '+(cls||''); e.innerHTML=txt; }
+function s5reset(){
+  for(let p=0;p<3;p++){
+    document.getElementById('s5lane'+p).className='ranklane';
+    rstat(p,'idle','');
+    for(let t=0;t<4;t++) document.getElementById(`s5tp-${p}-${t}`).className='tp';
+    for(let i=0;i<NPG;i++) hideNode(p,i);
+  }
+  for(let i=0;i<NPG;i++) document.getElementById('s5l3-'+i).className='pg';
+  document.getElementById('gpuBadge').className='gpu-badge';
+  document.getElementById('l3box').className='l3box';
+  document.getElementById('l3badge').className='badge'; document.getElementById('l3badge').textContent='';
+  document.getElementById('s5sync1').className='syncbadge g1';
+  document.getElementById('s5sync2').className='syncbadge g2';
+  s5flag('','');
+}
+async function t5gate(my){ while(t5Paused){ await sleep(120); if(my!==t5Token) throw 0; } }
+async function t5step(ms,my){ await sleep(ms); await t5gate(my); if(my!==t5Token) throw 0; }
+const allRanks=fn=>{ for(let p=0;p<3;p++) fn(p); };
+
+async function runTab5(){
+  const my=++t5Token;
+  try{
+    while(true){
+      s5reset();
+      cap5.innerHTML='场景 <b>PP=3 × TP=4</b>：每个 PP rank 维护一棵 <b>L2 host radix tree</b>，共享底层 <b>L3 持久化存储</b>。跟踪两个请求，看 host tree 如何保持一致。';
+      await t5step(2000,my);
+
+      /* ===== ACT 1 : Request A — miss → GPU → L2 insert → L3 backup ===== */
+      cap5.innerHTML='① <b>请求 A</b> 到达（需要 4 个 page 的前缀），3 个 PP rank 同时处理。';
+      allRanks(p=>{ document.getElementById('s5lane'+p).classList.add('active'); for(let t=0;t<4;t++) document.getElementById(`s5tp-${p}-${t}`).className='tp on'; rstat(p,'req A',''); });
+      await t5step(1700,my);
+
+      cap5.innerHTML='① 查 L2 host tree → <b style="color:var(--red)">miss</b>；查 L3 → <b style="color:var(--red)">miss</b>（存储为空）。';
+      allRanks(p=>rstat(p,'L2/L3 miss','miss'));
+      document.getElementById('l3badge').className='badge miss'; document.getElementById('l3badge').textContent='miss';
+      await t5step(1900,my);
+
+      cap5.innerHTML='① 回退到 <b>GPU 前向计算</b>，生成这 4 个 page 的 KV。';
+      document.getElementById('gpuBadge').classList.add('busy');
+      allRanks(p=>rstat(p,'compute','warn'));
+      await t5step(1800,my);
+      document.getElementById('gpuBadge').classList.remove('busy');
+
+      cap5.innerHTML='① 计算结果写入 <b>L2 host radix tree</b> → 3 个 rank <code>insert</code> <strong style="color:var(--green)">相同</strong>的前缀 p0–p3。';
+      for(let i=0;i<NPG;i++){ allRanks(p=>setNode(p,i,'inserting')); await t5step(240,my); }
+      allRanks(p=>{ for(let i=0;i<NPG;i++) setNode(p,i,'committed'); rstat(p,'L2: 4','hit'); });
+      s5flag('✓ 3 棵 host tree 同步插入 4 个 page（一致）','ok');
+      await t5step(1600,my);
+
+      cap5.innerHTML='① backup 线程把 L2 → <b>L3</b> 持久化（<code>write_backup</code> / <code>page_set</code>）。';
+      document.getElementById('l3box').classList.add('hot');
+      document.getElementById('l3badge').className='badge hit'; document.getElementById('l3badge').textContent='stored';
+      for(let i=0;i<NPG;i++){ document.getElementById('s5l3-'+i).className='pg show l3'; await t5step(220,my); }
+      s5flag('','');
+      await t5step(1500,my);
+
+      /* ===== ACT 1.5 : L2 eviction (delete consistency) ===== */
+      cap5.innerHTML='② host 内存压力 → L2 触发<strong style="color:var(--red)">淘汰</strong>（<code>evict_host</code>）。3 棵 host tree <b>完全一致</b> → 淘汰命中<strong>同一批节点</strong>；L3 仍保留。';
+      for(let i=NPG-1;i>=0;i--){ allRanks(p=>setNode(p,i,'evict')); await t5step(330,my); allRanks(p=>hideNode(p,i)); }
+      allRanks(p=>rstat(p,'L2 empty',''));
+      s5flag('✓ 3 棵 host tree 同步删除（delete 一致）','ok');
+      await t5step(2000,my);
+      s5flag('','');
+
+      /* ===== ACT 2 : Request B — L3 hit → 2 MIN syncs → consistent insert ===== */
+      cap5.innerHTML='③ <b>请求 B</b> 到达（复用 A 的前缀）。L2 host tree 已空 → <b style="color:var(--red)">L2 miss</b>，转向 L3。';
+      allRanks(p=>{ document.getElementById('s5lane'+p).classList.add('active'); rstat(p,'req B','warn'); });
+      await t5step(1700,my);
+
+      cap5.innerHTML='③ <b>prefetch_thread</b> 各 rank 向 L3 查命中页数 → 结果可能<strong style="color:var(--amber)">不同</strong>（host 视图/内存差异）：4 / 3 / 4。';
+      const hitq=[4,3,4];
+      allRanks(p=>{ rstat(p,'L3 hit '+hitq[p], hitq[p]===4?'hit':'warn'); for(let i=0;i<hitq[p];i++) setNode(p,i,'warn'); });
+      s5flag('⚠ 查询长度不一致（4/3/4）→ 若各自建树，host tree 会发散','bad');
+      await t5step(2600,my);
+
+      cap5.innerHTML='◆ <b>第一个 MIN</b> @ <code>prefetch_hits_sync_groups</code>（组1，gloo/CPU，含 TP环+PP环）：<code>all_reduce(MIN)</code> 统一查询长度 = <b>3</b>。';
+      document.getElementById('s5sync1').classList.add('fire');
+      await t5step(1500,my);
+      allRanks(p=>{ for(let i=0;i<NPG;i++) hideNode(p,i); for(let i=0;i<3;i++) setNode(p,i,'matched'); rstat(p,'match 3','hit'); });
+      s5flag('✓ 抓取范围统一 = 3 → match_prefix 逐 rank 一致','ok');
+      await t5step(2200,my);
+
+      cap5.innerHTML='③ <b>prefetch_io_aux_thread</b> 逐 batch 把 page 从 L3 拉回 L2（<code>_page_transfer</code>），每 batch 产 1 个 PrefetchAck。';
+      document.getElementById('s5sync1').classList.remove('fire');
+      for(let i=0;i<3;i++){ allRanks(p=>setNode(p,i,'inserting')); await t5step(300,my); }
+      await t5step(700,my);
+
+      cap5.innerHTML='③ 逐页加载<strong style="color:var(--amber)">部分失败</strong>：rank2 第 3 页 <code>page_get</code> 未成功 → completed_tokens = 3 / 3 / 2。';
+      const done=[3,3,2];
+      allRanks(p=>{ for(let i=0;i<3;i++){ if(i<done[p]) setNode(p,i,'committed'); else setNode(p,i,'warn'); } rstat(p,'done '+done[p], done[p]===3?'hit':'warn'); });
+      s5flag('⚠ 实际落盘不一致（3/3/2）→ 若各自插入，host tree 会发散','bad');
+      await t5step(2600,my);
+
+      cap5.innerHTML='◆ <b>第二个 MIN</b> @ <code>prefetch_completion_sync_groups</code>（组2，<b>独立 communicator</b>）：<code>all_reduce(MIN)</code> 统一 completed_tokens = <b>2</b>。';
+      document.getElementById('s5sync2').classList.add('fire');
+      await t5step(1500,my);
+
+      cap5.innerHTML='③ 各 rank 只把统一的 <b>2 个 page</b> 插入 L2 host tree（<code>_insert_helper_host</code>）→ 3 棵 host tree <strong style="color:var(--green)">再次完全一致</strong>。';
+      allRanks(p=>{ for(let i=0;i<NPG;i++) hideNode(p,i); for(let i=0;i<2;i++) setNode(p,i,'committed'); rstat(p,'L2: 2','hit'); });
+      s5flag('✓ 插入长度统一 = 2 → 3 棵 host tree 完全一致','ok');
+      await t5step(2400,my);
+
+      cap5.innerHTML='✅ 两套<strong>独立 gloo 组</strong>（组1 命中数、组2 完成数）+ 每 batch 恒 1 个 ack → 各 rank 对 host tree 的<strong>插入/删除完全一致</strong> → <b style="color:var(--green)">host radix tree 始终一致，后台 collective 不会死锁</b>。';
+      document.getElementById('s5sync1').classList.add('fire');
+      await t5step(3400,my);
+      document.getElementById('s5sync1').classList.remove('fire');
+      document.getElementById('s5sync2').classList.remove('fire');
+      await t5step(700,my);
+    }
+  }catch(e){ /* cancelled */ }
+}
+function stopTab5(){ ++t5Token; }
+
+document.getElementById('play5').onclick=function(){
+  t5Paused=!t5Paused;
+  this.textContent=t5Paused?'▶ 播放':'⏸ 暂停';
+};
+document.getElementById('replay5').onclick=()=>{ t5Paused=false; document.getElementById('play5').textContent='⏸ 暂停'; runTab5(); };
+
+/* ============================================================
+   TAB 6 : PrefetchAck count alignment & anti-hang.
+   Each storage batch in _page_transfer emits exactly one PrefetchAck,
+   and prefetch_sync_thread does one all_reduce(MIN) on set-2 per ack.
+   So #acks == #batches == #set-2 collectives, and it must be equal on
+   every rank. We compare: (good) each batch always emits 1 ack even on
+   error → counts aligned → safe; (bad) break-on-error drops an ack →
+   one rank does fewer reduces → the others block forever → hang.
+   ============================================================ */
+const T6NB=3;                 // number of storage batches / acks
+(function buildAck(){
+  let m='';
+  for(let p=0;p<3;p++){
+    let slots='';
+    for(let k=0;k<T6NB;k++) slots+=`<div class="ackchip pending" id="ack-${p}-${k}">ack${k}</div>`;
+    m+=`<div class="ackrow" id="ackrow${p}"><div class="acklabel"><b>PP rank ${p}</b> · _page_transfer</div><div class="ackslots">${slots}</div></div>`;
+  }
+  document.getElementById('ackmesh').innerHTML=m;
+  let b='';
+  for(let k=0;k<T6NB;k++) b+=`<div class="bar" id="bar${k}">barrier ${k}<span class="bcount" id="bcnt${k}">0/3</span></div>`;
+  document.getElementById('barcols').innerHTML=b;
+})();
+
+let t6Token=0;
+const cap6=document.getElementById('cap6');
+const banner6=document.getElementById('banner6');
+const ackEl=(p,k)=>document.getElementById(`ack-${p}-${k}`);
+const barEl=k=>document.getElementById('bar'+k);
+const bcnt=k=>document.getElementById('bcnt'+k);
+function t6reset(){
+  ++t6Token;
+  for(let p=0;p<3;p++){
+    document.getElementById('ackrow'+p).className='ackrow';
+    for(let k=0;k<T6NB;k++){ ackEl(p,k).className='ackchip pending'; ackEl(p,k).innerHTML=`ack${k}`; }
+  }
+  for(let k=0;k<T6NB;k++){ barEl(k).className='bar'; bcnt(k).textContent='0/3'; }
+  banner6.className='banner'; banner6.textContent='';
+}
+async function s6(ms,my){ await sleep(ms); if(my!==t6Token) throw 0; }
+
+// nacks: how many acks each rank produces (rank index -> count)
+async function playAck(nacks, label){
+  t6reset(); const my=t6Token;
+  try{
+    cap6.innerHTML=label;
+    await s6(700,my);
+    for(let k=0;k<T6NB;k++){
+      barEl(k).classList.add('waiting');
+      let arrived=0;
+      // ranks emit ack k one by one (async arrival)
+      for(let p=0;p<3;p++){
+        await s6(520,my);
+        if(k<nacks[p]){
+          ackEl(p,k).className='ackchip emit';
+          arrived++; bcnt(k).textContent=arrived+'/3';
+        }
+      }
+      await s6(400,my);
+      if(arrived===3){
+        barEl(k).className='bar fired'; bcnt(k).textContent='3/3 ✓';
+        for(let p=0;p<3;p++) ackEl(p,k).className='ackchip passed';
+        cap6.innerHTML=(window.TR||((z,e)=>z))(
+          `barrier ${k}：3 个 rank 的 ack 都到齐 → <code>all_reduce(MIN)</code> 返回 → 本轮通过。`,
+          `barrier ${k}: all 3 ranks' acks arrived → <code>all_reduce(MIN)</code> returns → this round passes.`);
+        await s6(700,my);
+      }else{
+        // a rank is missing this ack -> collective can never complete
+        barEl(k).className='bar dead'; bcnt(k).textContent=arrived+'/3 ✗';
+        for(let p=0;p<3;p++){
+          if(k<nacks[p]){ ackEl(p,k).className='ackchip wait'; document.getElementById('ackrow'+p).classList.add('blocked'); }
+          else { ackEl(p,k).className='ackchip missing'; ackEl(p,k).innerHTML=`ack${k}<span class="err">${(window.TR||((z,e)=>z))('缺失','missing')}</span>`; }
+        }
+        cap6.innerHTML=(window.TR||((z,e)=>z))(
+          `barrier ${k}：只有 <b>${arrived}/3</b> 个 rank 进了 <code>all_reduce</code>（有 rank 早早 break、少产一个 ack）→ 已到达的 rank <b style="color:var(--amber)">永远阻塞</b>在这次 collective 上。`,
+          `barrier ${k}: only <b>${arrived}/3</b> ranks entered <code>all_reduce</code> (a rank broke early and emitted one fewer ack) → the ranks that arrived are <b style="color:var(--amber)">blocked forever</b> on this collective.`);
+        banner6.className='banner bad'; banner6.textContent='💥 HANG：组2 reduce 次数不一致（3/3/2）→ collective 永不返回';
+        return;
+      }
+    }
+    banner6.className='banner ok'; banner6.textContent='✅ 安全：每 rank 都做了 '+T6NB+' 次 reduce，次数严格相等，全部对齐完成';
+    cap6.innerHTML='每个 batch 恒产 1 个 ack（出错也产）→ <b>ack 数逐 rank 相等</b> → 组2 的 collective 一一对应 → 不会 hang。';
+  }catch(e){ /* cancelled */ }
+}
+function stopTab6(){ ++t6Token; }
+document.getElementById('play6good').onclick=()=>playAck([3,3,3],
+  '<b style="color:var(--green)">正确</b>：即便某 batch 出错，<code>_page_transfer</code> 也<strong>继续循环、照常产 ack</strong> → 三个 rank 都产 3 个 ack。');
+document.getElementById('play6bad').onclick=()=>playAck([3,3,2],
+  '<b style="color:var(--red)">错误（反面教材）</b>：rank2 在 batch2 出错就 <code>break</code> → 只产 2 个 ack，比别人少一个。');
+document.getElementById('reset6').onclick=()=>{ t6reset(); cap6.innerHTML='选择场景：<b>每 batch 恒 1 ack</b> → 次数对齐、安全；<b>出错就 break</b> → ack 缺一个 → 组2 reduce 错位 → hang。'; };
+
+/* ---------- tab switching ---------- */
+const ctl1=document.querySelectorAll('.controls')[1]; // [0] is now ctl5 (story)
+const ctl2=document.getElementById('ctl2');
+const ctl3=document.getElementById('ctl3');
+const ctl4=document.getElementById('ctl4');
+const ctl5=document.getElementById('ctl5');
+const ctl6=document.getElementById('ctl6');
+document.querySelectorAll('.tab').forEach(tab=>{
+  tab.onclick=()=>{
+    document.querySelectorAll('.tab').forEach(x=>x.classList.remove('active'));
+    tab.classList.add('active');
+    const w=tab.dataset.tab;
+    document.getElementById('scene5').classList.toggle('hidden', w!=='story');
+    document.getElementById('scene1').classList.toggle('hidden', w!=='consistency');
+    document.getElementById('scene2').classList.toggle('hidden', w!=='deadlock');
+    document.getElementById('scene3').classList.toggle('hidden', w!=='skew');
+    document.getElementById('scene4').classList.toggle('hidden', w!=='threads');
+    document.getElementById('scene6').classList.toggle('hidden', w!=='ackalign');
+    ctl5.style.display = w==='story'?'flex':'none';
+    ctl1.style.display = w==='consistency'?'flex':'none';
+    ctl2.style.display = w==='deadlock'?'flex':'none';
+    ctl3.style.display = w==='skew'?'flex':'none';
+    ctl4.style.display = w==='threads'?'flex':'none';
+    ctl6.style.display = w==='ackalign'?'flex':'none';
+    if(w==='story'){ ++t1Token; stopTab3(); stopTab4(); stopTab6(); t5Paused=false; document.getElementById('play5').textContent='⏸ 暂停'; runTab5(); }
+    else if(w==='consistency'){ t1Paused=false; document.getElementById('play1').textContent='⏸ 暂停'; runTab1(); stopTab3(); stopTab4(); stopTab5(); stopTab6(); }
+    else if(w==='skew'){ ++t1Token; startTab3(true); stopTab4(); stopTab5(); stopTab6(); }
+    else if(w==='threads'){ ++t1Token; stopTab3(); stopTab5(); stopTab6(); t4Paused=false; document.getElementById('play4').textContent='⏸ 暂停'; runTab4(); }
+    else if(w==='ackalign'){ ++t1Token; stopTab3(); stopTab4(); stopTab5(); t6reset(); }
+    else{ ++t1Token; stopTab3(); stopTab4(); stopTab5(); stopTab6(); }
+  };
+});
+ctl1.style.display='none';
+ctl2.style.display='none';
+ctl3.style.display='none';
+ctl4.style.display='none';
+ctl6.style.display='none';
+resetMesh2();
+runTab5();
+</script>
+
+<script>
+/* ============================================================
+   i18n: translate by text-content (robust to innerHTML normalization).
+   PAIRS = [zhHTML, enHTML]; keys derived from stripped textContent.
+   A MutationObserver re-translates dynamic captions on the fly.
+   ============================================================ */
+(function(){
+  const PAIRS = [
+    // header
+    ['HiCache × Pipeline Parallel：树一致性 & 防死锁','HiCache × Pipeline Parallel: Tree Consistency & Deadlock Avoidance'],
+    ['拓扑 <b>PP=3 × TP=8 = 24 ranks</b> · 行=TP 组、列=PP 组 · MIN all-reduce 保证 radix tree 一致 · 2 套 gloo 组避免后台 collective 死锁',
+     'Topology <b>PP=3 × TP=8 = 24 ranks</b> · rows = TP groups, cols = PP groups · MIN all-reduce keeps radix trees identical · 2 gloo group-sets avoid background-collective deadlock'],
+    // tabs
+    ['① 两请求全流程（L3 命中/未命中 · host tree 一致）','① Two-Request Lifecycle (L3 miss/hit · host-tree consistency)'],
+    ['② 树一致性（自动播放）','② Tree Consistency (auto-play)'],
+    ['③ 为什么 2 个组不死锁','③ Why 2 Groups Avoid Deadlock'],
+    ['④ 异步时间差 × MIN 统一步调','④ Async Skew × MIN Lockstep'],
+    ['⑤ 线程关系 & 树一致性','⑤ Thread Relationships & Consistency'],
+    ['⑥ PrefetchAck 对齐 & 防 hang','⑥ PrefetchAck Alignment & Anti-Hang'],
+    // tab6 note / legend / barrier label / scenario captions / banners
+    ['每个 <b>storage batch</b> 在 <code>_page_transfer</code> 里恒产 <b>1 个 PrefetchAck</b>；<code>prefetch_sync_thread</code> 对<strong>每个 ack</strong> 在组2 做一次 <code>all_reduce(MIN)</code>。所以 <b>ack 数 = batch 数 = 组2 collective 次数</b>，必须逐 rank 相等。',
+     'Each <b>storage batch</b> in <code>_page_transfer</code> always emits <b>exactly 1 PrefetchAck</b>; <code>prefetch_sync_thread</code> does one <code>all_reduce(MIN)</code> on set 2 <strong>per ack</strong>. So <b>#acks = #batches = #set-2 collectives</b>, and it must be equal on every rank.'],
+    ['<span class="sw" style="background:var(--blue)"></span>ack 已产出（参与本轮 reduce）','<span class="sw" style="background:var(--blue)"></span>ack emitted (joins this reduce)'],
+    ['<span class="sw" style="background:var(--green)"></span>barrier 凑齐 3/3 → 通过','<span class="sw" style="background:var(--green)"></span>barrier reaches 3/3 → pass'],
+    ['<span class="sw" style="background:var(--amber)"></span>已到达，等待缺席方','<span class="sw" style="background:var(--amber)"></span>arrived, waiting for the absent rank'],
+    ['<span class="sw" style="background:var(--red)"></span>缺失 ack → 永远等不到','<span class="sw" style="background:var(--red)"></span>missing ack → never arrives'],
+    ['◆ <code>all_reduce(MIN)</code> @ 组2（prefetch_completion_sync_groups）· 每个 ack 一次 barrier',
+     '◆ <code>all_reduce(MIN)</code> @ set 2 (prefetch_completion_sync_groups) · one barrier per ack'],
+    ['选择场景：<b>每 batch 恒 1 ack</b> → 次数对齐、安全；<b>出错就 break</b> → ack 缺一个 → 组2 reduce 错位 → hang。',
+     'Pick a scenario: <b>one ack per batch</b> → counts aligned, safe; <b>break on error</b> → one ack missing → set-2 reduces misalign → hang.'],
+    ['<b style="color:var(--green)">正确</b>：即便某 batch 出错，<code>_page_transfer</code> 也<strong>继续循环、照常产 ack</strong> → 三个 rank 都产 3 个 ack。',
+     '<b style="color:var(--green)">Correct</b>: even if a batch errors, <code>_page_transfer</code> <strong>keeps looping and still emits the ack</strong> → all three ranks emit 3 acks.'],
+    ['<b style="color:var(--red)">错误（反面教材）</b>：rank2 在 batch2 出错就 <code>break</code> → 只产 2 个 ack，比别人少一个。',
+     '<b style="color:var(--red)">Wrong (anti-pattern)</b>: rank2 hits an error at batch2 and <code>break</code>s → emits only 2 acks, one fewer than the others.'],
+    ['▶ 正确（每 batch 恒 1 ack）','▶ Correct (one ack per batch)'],
+    ['▶ 错误（出错 break → ack 缺失）','▶ Wrong (break on error → missing ack)'],
+    ['每个 batch 恒产 1 个 ack（出错也产）→ <b>ack 数逐 rank 相等</b> → 组2 的 collective 一一对应 → 不会 hang。',
+     'Each batch always emits one ack (even on error) → <b>ack counts equal across ranks</b> → set-2 collectives match one-to-one → no hang.'],
+    ['✅ 安全：每 rank 都做了 3 次 reduce，次数严格相等，全部对齐完成','✅ Safe: every rank did 3 reduces, counts strictly equal, all aligned and complete'],
+    ['💥 HANG：组2 reduce 次数不一致（3/3/2）→ collective 永不返回','💥 HANG: set-2 reduce counts differ (3/3/2) → the collective never returns'],
+    // tab5 legend chips
+    ['<span class="sw" style="background:var(--blue)"></span>GPU 计算 / 插入中','<span class="sw" style="background:var(--blue)"></span>GPU compute / inserting'],
+    ['<span class="sw" style="background:var(--cyan)"></span>match 命中前缀','<span class="sw" style="background:var(--cyan)"></span>matched prefix'],
+    ['<span class="sw" style="background:var(--amber)"></span>各 rank 不一致（待 MIN 统一）','<span class="sw" style="background:var(--amber)"></span>diverged per rank (await MIN)'],
+    ['<span class="sw" style="background:var(--green)"></span>已提交 / 一致','<span class="sw" style="background:var(--green)"></span>committed / consistent'],
+    ['<span class="sw" style="background:var(--red)"></span>未命中 / 淘汰删除','<span class="sw" style="background:var(--red)"></span>miss / evicted'],
+    // tab5 static labels
+    ['<b>L3 持久化存储</b>（storage backend，3 个 rank 共享视图）',
+     '<b>L3 persistent storage</b> (storage backend, shared view across 3 ranks)'],
+    ['GPU 计算','GPU compute'],
+    ['◆ MIN 组1 · prefetch_hits_sync_groups · storage_hit_count','◆ MIN set 1 · prefetch_hits_sync_groups · storage_hit_count'],
+    ['◆ MIN 组2 · prefetch_completion_sync_groups · completed_tokens','◆ MIN set 2 · prefetch_completion_sync_groups · completed_tokens'],
+    // tab5 consistency flags
+    ['✓ 3 棵 host tree 同步插入 4 个 page（一致）','✓ all 3 host trees insert 4 pages in sync (consistent)'],
+    ['✓ 3 棵 host tree 同步删除（delete 一致）','✓ all 3 host trees delete in sync (consistent)'],
+    ['⚠ 查询长度不一致（4/3/4）→ 若各自建树，host tree 会发散','⚠ query lengths differ (4/3/4) → building trees independently diverges them'],
+    ['✓ 抓取范围统一 = 3 → match_prefix 逐 rank 一致','✓ fetch range unified = 3 → match_prefix identical per rank'],
+    ['⚠ 实际落盘不一致（3/3/2）→ 若各自插入，host tree 会发散','⚠ actual loads differ (3/3/2) → inserting independently diverges trees'],
+    ['✓ 插入长度统一 = 2 → 3 棵 host tree 完全一致','✓ insert length unified = 2 → all 3 host trees identical'],
+    // tab5 step captions
+    ['场景 <b>PP=3 × TP=4</b>：每个 PP rank 维护一棵 <b>L2 host radix tree</b>，共享底层 <b>L3 持久化存储</b>。跟踪两个请求，看 host tree 如何保持一致。',
+     'Scenario <b>PP=3 × TP=4</b>: each PP rank keeps an <b>L2 host radix tree</b> over a shared <b>L3 persistent storage</b>. We follow two requests and see how the trees stay consistent.'],
+    ['① <b>请求 A</b> 到达（需要 4 个 page 的前缀），3 个 PP rank 同时处理。',
+     '① <b>Request A</b> arrives (needs a 4-page prefix); all 3 PP ranks process it together.'],
+    ['① 查 L2 host tree → <b style="color:var(--red)">miss</b>；查 L3 → <b style="color:var(--red)">miss</b>（存储为空）。',
+     '① Query L2 host tree → <b style="color:var(--red)">miss</b>; query L3 → <b style="color:var(--red)">miss</b> (storage empty).'],
+    ['① 回退到 <b>GPU 前向计算</b>，生成这 4 个 page 的 KV。',
+     '① Fall back to <b>GPU forward compute</b> to produce the KV for these 4 pages.'],
+    ['① 计算结果写入 <b>L2 host radix tree</b> → 3 个 rank <code>insert</code> <strong style="color:var(--green)">相同</strong>的前缀 p0–p3。',
+     '① Results are written into the <b>L2 host radix tree</b> → all 3 ranks <code>insert</code> the <strong style="color:var(--green)">same</strong> prefix p0–p3.'],
+    ['① backup 线程把 L2 → <b>L3</b> 持久化（<code>write_backup</code> / <code>page_set</code>）。',
+     '① The backup thread persists L2 → <b>L3</b> (<code>write_backup</code> / <code>page_set</code>).'],
+    ['② host 内存压力 → L2 触发<strong style="color:var(--red)">淘汰</strong>（<code>evict_host</code>）。3 棵 host tree <b>完全一致</b> → 淘汰命中<strong>同一批节点</strong>；L3 仍保留。',
+     '② Host-memory pressure → L2 <strong style="color:var(--red)">eviction</strong> (<code>evict_host</code>). The 3 host trees are <b>identical</b> → eviction hits the <strong>same nodes</strong>; L3 keeps them.'],
+    ['③ <b>请求 B</b> 到达（复用 A 的前缀）。L2 host tree 已空 → <b style="color:var(--red)">L2 miss</b>，转向 L3。',
+     '③ <b>Request B</b> arrives (reuses A\u2019s prefix). The L2 host tree is empty → <b style="color:var(--red)">L2 miss</b>, fall through to L3.'],
+    ['③ <b>prefetch_thread</b> 各 rank 向 L3 查命中页数 → 结果可能<strong style="color:var(--amber)">不同</strong>（host 视图/内存差异）：4 / 3 / 4。',
+     '③ <b>prefetch_thread</b> on each rank queries L3 hit pages → results may <strong style="color:var(--amber)">differ</strong> (host view / memory): 4 / 3 / 4.'],
+    ['◆ <b>第一个 MIN</b> @ <code>prefetch_hits_sync_groups</code>（组1，gloo/CPU，含 TP环+PP环）：<code>all_reduce(MIN)</code> 统一查询长度 = <b>3</b>。',
+     '◆ <b>First MIN</b> @ <code>prefetch_hits_sync_groups</code> (set 1, gloo/CPU, TP+PP rings): <code>all_reduce(MIN)</code> unifies the query length = <b>3</b>.'],
+    ['③ <b>prefetch_io_aux_thread</b> 逐 batch 把 page 从 L3 拉回 L2（<code>_page_transfer</code>），每 batch 产 1 个 PrefetchAck。',
+     '③ <b>prefetch_io_aux_thread</b> pulls pages L3→L2 batch by batch (<code>_page_transfer</code>), one PrefetchAck per batch.'],
+    ['③ 逐页加载<strong style="color:var(--amber)">部分失败</strong>：rank2 第 3 页 <code>page_get</code> 未成功 → completed_tokens = 3 / 3 / 2。',
+     '③ Per-page load <strong style="color:var(--amber)">partially fails</strong>: rank2\u2019s 3rd page <code>page_get</code> fails → completed_tokens = 3 / 3 / 2.'],
+    ['◆ <b>第二个 MIN</b> @ <code>prefetch_completion_sync_groups</code>（组2，<b>独立 communicator</b>）：<code>all_reduce(MIN)</code> 统一 completed_tokens = <b>2</b>。',
+     '◆ <b>Second MIN</b> @ <code>prefetch_completion_sync_groups</code> (set 2, <b>independent communicator</b>): <code>all_reduce(MIN)</code> unifies completed_tokens = <b>2</b>.'],
+    ['③ 各 rank 只把统一的 <b>2 个 page</b> 插入 L2 host tree（<code>_insert_helper_host</code>）→ 3 棵 host tree <strong style="color:var(--green)">再次完全一致</strong>。',
+     '③ Each rank inserts only the unified <b>2 pages</b> into its L2 host tree (<code>_insert_helper_host</code>) → all 3 host trees are <strong style="color:var(--green)">identical again</strong>.'],
+    ['✅ 两套<strong>独立 gloo 组</strong>（组1 命中数、组2 完成数）+ 每 batch 恒 1 个 ack → 各 rank 对 host tree 的<strong>插入/删除完全一致</strong> → <b style="color:var(--green)">host radix tree 始终一致，后台 collective 不会死锁</b>。',
+     '✅ Two <strong>independent gloo group-sets</strong> (set 1 hit count, set 2 completed tokens) + exactly one ack per batch → every rank\u2019s <strong>inserts/deletes are identical</strong> → <b style="color:var(--green)">host radix trees stay consistent and background collectives never deadlock</b>.'],
+    // buttons
+    ['⏸ 暂停','⏸ Pause'],['▶ 播放','▶ Play'],['⟲ 重播','⟲ Replay'],['重置','Reset'],
+    ['▶ 1 套组（死锁）','▶ 1 group set (deadlock)'],['▶ 2 套组（安全）','▶ 2 group sets (safe)'],
+    // tab1 legend + tree title + init caption
+    ['<span class="sw" style="background:var(--amber)"></span>命中数被截断（不一致）','<span class="sw" style="background:var(--amber)"></span>hit count truncated (inconsistent)'],
+    ['<span class="sw" style="background:var(--blue)"></span>TP 组内 MIN 后','<span class="sw" style="background:var(--blue)"></span>after MIN within TP group'],
+    ['<span class="sw" style="background:var(--green)"></span>PP 组内 MIN 后（全局一致）','<span class="sw" style="background:var(--green)"></span>after MIN within PP group (global)'],
+    ['所有 24 个 rank 共享同一棵 radix tree','all 24 ranks share one radix tree'],
+    ['自动播放中…','auto-playing…'],
+    // tab1 captions
+    ['拓扑 <b>PP=3 × TP=8 = 24 个 rank</b>：每个 PP stage 下挂 8 个 TP rank。',
+     'Topology <b>PP=3 × TP=8 = 24 ranks</b>: each PP stage holds 8 TP ranks.'],
+    ['① 各 rank <span class="k">独立</span>向 L3 查询前缀命中。<b style="color:var(--amber)">注意 r10、r15 因 host 内存压力被截断</b>（6 / 7 页）。',
+     '① Each rank <span class="k">independently</span> queries L3 for prefix hits. <b style="color:var(--amber)">Note r10 & r15 are truncated by host-memory pressure</b> (6 / 7 pages).'],
+    ['② 若各 rank 按自己的命中数建 radix tree → 树高不一致 → 后续 PP 集合通信 <b style="color:var(--red)">shape mismatch → crash</b>。',
+     '② If each rank builds its radix tree from its own hit count → tree heights differ → next PP collective <b style="color:var(--red)">shape mismatch → crash</b>.'],
+    ['③ 第一步：在 <span class="k">TP 组（每一行 8 个 rank）</span>内 <code>all_reduce(MIN)</code>。',
+     '③ Step 1: <code>all_reduce(MIN)</code> within each <span class="k">TP group (a row of 8 ranks)</span>.'],
+    ['③ TP 组归约后：<b>每一行变得一致</b>（PP0=8, PP1=6, PP2=8 = 各行最小值）。',
+     '③ After TP reduce: <b>each row is uniform</b> (PP0=8, PP1=6, PP2=8 = per-row min).'],
+    ['④ 第二步：在 <span class="k">PP 组（每一列 3 个 rank）</span>内 <code>all_reduce(MIN)</code> → 收敛到全局最小值。',
+     '④ Step 2: <code>all_reduce(MIN)</code> within each <span class="k">PP group (a column of 3 ranks)</span> → converge to the global minimum.'],
+    ['④ PP 组归约后：<b style="color:var(--green)">全部 24 个 rank 命中数 = 6</b>（最长公共前缀）。',
+     '④ After PP reduce: <b style="color:var(--green)">all 24 ranks hit = 6</b> (longest common prefix).'],
+    ['⑤ 所有 rank 都只 prefetch / 建树到 6 → <span style="color:var(--green)">24 个 rank 的 radix tree 完全一致 ✓</span>',
+     '⑤ Every rank prefetches / builds the tree only up to 6 → <span style="color:var(--green)">all 24 radix trees are identical ✓</span>'],
+    // tab2 legend + note + groups + init + captions + banners
+    ['<span class="sw" style="background:var(--purple)"></span><b>prefetch_thread</b>（独立后台线程）· reduce(storage_hit_count)','<span class="sw" style="background:var(--purple)"></span><b>prefetch_thread</b> (independent background thread) · reduce(storage_hit_count)'],
+    ['<span class="sw" style="background:var(--cyan)"></span><b>prefetch_sync_thread</b>（独立后台线程）· reduce(completed_tokens)','<span class="sw" style="background:var(--cyan)"></span><b>prefetch_sync_thread</b> (independent background thread) · reduce(completed_tokens)'],
+    ['每个 cell = 1 个 rank，内含 2 个独立后台线程（小圆点 ●A ●B）。每一行是一个 <b>TP communicator</b>，每一列是一个 <b>PP communicator</b>。',
+     'Each cell = 1 rank, holding 2 independent background threads (dots ●A ●B). Each row is a <b>TP communicator</b>, each column a <b>PP communicator</b>.'],
+    ['<b>prefetch_hits_sync_groups</b><br>命中页数归约组（含 TP 环 + PP 环）<br><span style="font-size:11px">reduce(storage_hit_count)</span>',
+     '<b>prefetch_hits_sync_groups</b><br>hit-count reduce set (TP rings + PP rings)<br><span style="font-size:11px">reduce(storage_hit_count)</span>'],
+    ['<b>prefetch_completion_sync_groups</b><br>完成 token 归约组（含 TP 环 + PP 环）<br><span style="font-size:11px">reduce(completed_tokens)</span>',
+     '<b>prefetch_completion_sync_groups</b><br>completed-token reduce set (TP rings + PP rings)<br><span style="font-size:11px">reduce(completed_tokens)</span>'],
+    ['选择场景：用 <b>1 套组</b> 会死锁，用 <b>2 套组</b> 则安全。','Pick a scenario: <b>1 group set</b> deadlocks, <b>2 group sets</b> are safe.'],
+    ['只有 <b>1 套组</b>：prefetch_thread(A) 与 prefetch_sync_thread(B) 共用同一个 communicator 集。',
+     'Only <b>1 group set</b>: prefetch_thread(A) and prefetch_sync_thread(B) share the same communicator set.'],
+    ['两个后台线程<b>独立调度、顺序不定</b>：同一个 TP 环里，有的 rank 先发 A，有的先发 B。',
+     'The two background threads are <b>scheduled independently, order unpredictable</b>: within one TP ring some ranks post A first, others post B first.'],
+    ['同一个 communicator 上各 rank 提交的 collective <b style="color:var(--red)">不是同一个</b>（A 与 B 错位）→ rendezvous 永远配不上。',
+     'On the same communicator the collectives submitted by different ranks are <b style="color:var(--red)">not the same</b> (A vs B misaligned) → rendezvous never matches.'],
+    ['只要任一 communicator 上 A/B 交错，该环就死锁 → 全局 PP/TP 通信连环卡住。',
+     'If A/B interleave on any communicator, that ring deadlocks → all PP/TP communication hangs in a chain.'],
+    ['💥 DEADLOCK — 整个 24-rank job 卡死','💥 DEADLOCK — the whole 24-rank job hangs'],
+    ['用 <b>2 套独立组</b>：<b style="color:var(--purple)">A 永远走 prefetch_hits_sync_groups</b>，<b style="color:var(--cyan)">B 永远走 prefetch_completion_sync_groups</b>。',
+     'With <b>2 independent group sets</b>: <b style="color:var(--purple)">A always uses prefetch_hits_sync_groups</b>, <b style="color:var(--cyan)">B always uses prefetch_completion_sync_groups</b>.'],
+    ['第一波：所有 rank 的 <b>prefetch_thread</b> 只在 <code>prefetch_hits_sync_groups</code> 上提交 A → 序列一致。',
+     'Wave 1: every rank\u2019s <b>prefetch_thread</b> posts A only on <code>prefetch_hits_sync_groups</code> → consistent order.'],
+    ['✓ TP 环 + PP 环上 A 全部到齐 → 第一波归约完成。','✓ A arrives on every TP ring + PP ring → wave 1 reduce done.'],
+    ['第二波：所有 rank 的 <b>prefetch_sync_thread</b> 只在 <code>prefetch_completion_sync_groups</code> 上提交 B → 序列一致。',
+     'Wave 2: every rank\u2019s <b>prefetch_sync_thread</b> posts B only on <code>prefetch_completion_sync_groups</code> → consistent order.'],
+    ['每个 communicator 上的 collective 序列在所有 rank <b style="color:var(--green)">完全一致</b>（A→组1、B→组2 不交叉）→ 不会死锁。',
+     'The collective sequence on each communicator is <b style="color:var(--green)">identical across ranks</b> (A→set1, B→set2, never crossing) → no deadlock.'],
+    ['✅ 安全 — 24 个 rank 全部对齐完成','✅ Safe — all 24 ranks aligned and complete'],
+    // tab3 titles / conduits / flow-note / lane hint / captions / formers
+    ['③ 主 PP 流水线执行<strong>时序</strong> <span class="tag gpu">NCCL · GPU</span> <span style="color:var(--muted);font-size:11px;">时序连续、错峰流动，<strong style="color:var(--green)">不被后台 prefetch 同步打断</strong></span>',
+     '③ Main PP pipeline execution <strong>timing</strong> <span class="tag gpu">NCCL · GPU</span> <span style="color:var(--muted);font-size:11px;">continuous, staggered flow, <strong style="color:var(--green)">never interrupted by background prefetch sync</strong></span>'],
+    ['↑ 流水线跑的正是②组好的 <strong>mb0→mb3</strong>，沿 stage0→1→2 错峰对角推进',
+     '↑ The pipeline runs exactly the <strong>mb0→mb3</strong> composed in ②, advancing diagonally stage0→1→2'],
+    ['▲ 组好的 <b>batch &amp; micro-batch 顺序</b> 喂给流水线（内容）','▲ The composed <b>batch &amp; micro-batch order</b> feeds the pipeline (content)'],
+    ['② 三个 PP rank 用<strong>同一个 storage hit</strong> 组 batch（内容必须逐 rank 一致）',
+     '② The three PP ranks compose the batch from <strong>the same storage hit</strong> (content must match per rank)'],
+    ['▲ <code>all_reduce(MIN)</code> 输出统一值 <b>6</b> → 决定 batch size','▲ <code>all_reduce(MIN)</code> outputs the unified value <b>6</b> → determines batch size'],
+    ['① 异步 prefetch 查询 → <code>all_reduce(MIN)</code> <span class="tag cpu">gloo · CPU 后台线程</span>',
+     '① Async prefetch query → <code>all_reduce(MIN)</code> <span class="tag cpu">gloo · CPU background thread</span>'],
+    ['（等待②组好的 batch…）','(waiting for batch from ②…)'],
+    ['① 三个 PP rank 的 prefetch 查询<strong>异步发起</strong>（到达时刻不同）。','① The three PP ranks issue prefetch queries <strong>asynchronously</strong> (different arrival times).'],
+    ['① 先到的 rank 在 <span class="k">gloo CPU 后台线程</span>上<b style="color:var(--amber)">等待对齐</b>（不占 GPU）。',
+     '① Earlier ranks <b style="color:var(--amber)">wait to align</b> on the <span class="k">gloo CPU background thread</span> (no GPU use).'],
+    ['① <b style="color:var(--amber)">pp2 最慢</b>到达 → <code>all_reduce(MIN)</code> 把 8/6/7 <strong style="color:var(--green)">统一成 6</strong>。',
+     '① <b style="color:var(--amber)">pp2 is slowest</b> to arrive → <code>all_reduce(MIN)</code> unifies 8/6/7 <strong style="color:var(--green)">into 6</strong>.'],
+    ['② 统一后的 <b>storage hit = 6</b> 下发给各 rank 调度器 → 决定<strong>已缓存前缀长度 / batch size / micro-batch 顺序</strong>（mb0→mb3）。',
+     '② The unified <b>storage hit = 6</b> goes to each rank\u2019s scheduler → determines <strong>cached prefix length / batch size / micro-batch order</strong> (mb0→mb3).'],
+    ['③ 三个 rank 因拿到<strong style="color:var(--green)">同一个 6</strong> 而组出<strong style="color:var(--green)">完全一致的 batch 与 mb 顺序</strong>，喂给 PP 流水线；执行时序连续不被打断。<br><span style="color:var(--red)">⚠ 若 storage hit 不统一 → batch/mb 顺序逐 rank 发散 → PP 调度错位、卡死。</span>',
+     '③ Because all three ranks get <strong style="color:var(--green)">the same 6</strong>, they compose <strong style="color:var(--green)">identical batches and mb order</strong> fed to the PP pipeline; timing stays continuous.<br><span style="color:var(--red)">⚠ If storage hit weren\u2019t unified → batch/mb order diverges per rank → PP scheduling mismatch & hang.</span>'],
+    // formers
+    ['调度器 · PP rank 0','Scheduler · PP rank 0'],['调度器 · PP rank 1','Scheduler · PP rank 1'],['调度器 · PP rank 2','Scheduler · PP rank 2'],
+    ['已缓存前缀 storage hit = ','cached prefix storage hit = '],
+    [' 页 → 决定 batch 组成',' pages → determines batch'],
+    ['✓ batch & mb 顺序一致','✓ identical batch & mb order'],
+    // mesh labels
+    ['<b>PP stage 0</b><br>(TP 组)','<b>PP stage 0</b><br>(TP group)'],
+    ['<b>PP stage 1</b><br>(TP 组)','<b>PP stage 1</b><br>(TP group)'],
+    ['<b>PP stage 2</b><br>(TP 组)','<b>PP stage 2</b><br>(TP group)'],
+    ['PP 组(列)<br>每列跨 3 个 stage →','PP groups (cols)<br>each spans 3 stages →'],
+    // tab4 flow boxes
+    ['调度器 Scheduler <span class="pin">主线程</span>','Scheduler <span class="pin">main thread</span>'],
+    ['发起 prefetch 请求（writeback / load）','Issues prefetch requests (writeback / load)'],
+    ['▼ <b>prefetch_queue</b>（PrefetchOperation）','▼ <b>prefetch_queue</b> (PrefetchOperation)'],
+    ['① prefetch_thread <span class="pin">storage-hit 线程</span>','① prefetch_thread <span class="pin">storage-hit thread</span>'],
+    ['<code>_storage_hit_query()</code> 查询 L3 命中页数；命中足够→放 prefetch_buffer，不足→prefetch_revoke_queue',
+     '<code>_storage_hit_query()</code> queries L3 hit pages; enough hits → prefetch_buffer, too few → prefetch_revoke_queue'],
+    ['◆ all_reduce(MIN) storage_hit_count <small>@ prefetch_hits_sync_groups（组1，gloo/CPU，含 TP 环 + PP 环）</small>',
+     '◆ all_reduce(MIN) storage_hit_count <small>@ prefetch_hits_sync_groups (set 1, gloo/CPU, TP rings + PP rings)</small>'],
+    ['▼ <b>prefetch_buffer</b>','▼ <b>prefetch_buffer</b>'],
+    ['② prefetch_io_aux_thread <span class="pin">IO 加载线程</span>','② prefetch_io_aux_thread <span class="pin">IO load thread</span>'],
+    ['<code>_page_transfer()</code> 逐 batch 把页从 L3 读入 host；累加 <b>completed_tokens</b>；<b>每个 storage batch 产生 1 个 PrefetchAck</b>（出错也照常产生）',
+     '<code>_page_transfer()</code> loads pages L3→host batch by batch; accumulates <b>completed_tokens</b>; <b>each storage batch emits exactly 1 PrefetchAck</b> (even on error)'],
+    ['▼ <b>prefetch_sync_queue</b>（PrefetchAck）','▼ <b>prefetch_sync_queue</b> (PrefetchAck)'],
+    ['③ prefetch_sync_thread <span class="pin">completion-token 线程</span>','③ prefetch_sync_thread <span class="pin">completion-token thread</span>'],
+    ['对每个 ack 的 <b>completed_tokens</b> 做归约','Reduces <b>completed_tokens</b> of every ack'],
+    ['◆ all_reduce(MIN) completed_tokens <small>@ prefetch_completion_sync_groups（组2，gloo/CPU，含 TP 环 + PP 环）</small>',
+     '◆ all_reduce(MIN) completed_tokens <small>@ prefetch_completion_sync_groups (set 2, gloo/CPU, TP rings + PP rings)</small>'],
+    ['▼ <b>ack_prefetch_queue</b>','▼ <b>ack_prefetch_queue</b>'],
+    ['调度器写入 host radix tree','Scheduler inserts into host radix tree'],
+    ['只插入 <b>completed_tokens</b> 长度的前缀 → <code>_insert_helper_host()</code>','Inserts only the <b>completed_tokens</b>-long prefix → <code>_insert_helper_host()</code>'],
+    ['为什么 MIN(storage_hit) 一致？','Why does MIN(storage_hit) ensure consistency?'],
+    ['各 rank 命中可能不同（host 内存截断、L3 视图差异）。MIN 取<b>最长公共可命中前缀</b> → 所有 rank <b>抓取范围一致</b>，不会各抓不同长度。',
+     'Hits may differ per rank (host-mem truncation, L3 view differences). MIN takes the <b>longest common hittable prefix</b> → every rank <b>fetches the same range</b>, never different lengths.'],
+    ['为什么 MIN(completed_tokens) 一致？','Why does MIN(completed_tokens) ensure consistency?'],
+    ['即便抓取范围一致，实际逐页加载仍可能<b>部分失败</b>（<code>page_get</code> 返回 n≠batch）。MIN 只提交<b>所有 rank 都成功落盘的最长公共前缀</b> → 写入 host tree 的长度逐 rank 相同。',
+     'Even with the same fetch range, per-page loads can <b>partially fail</b> (<code>page_get</code> returns n≠batch). MIN commits only the <b>longest common prefix every rank loaded successfully</b> → identical insert length per rank.'],
+    ['为什么不会 hang？','Why no hang?'],
+    ['每个 storage batch <b>都产生且仅产生一个 PrefetchAck</b>（即使出错也照常产生）→ 每个 rank 参与的 reduce <b>次数严格相等</b>，collective 一一对齐。两个 MIN 一起保证：<b>插入 host tree 的前缀逐 rank 完全相同 → 树一致</b>。',
+     'Each storage batch <b>emits exactly one PrefetchAck</b> (even on error) → every rank joins the <b>same number of reduces</b>, collectives align one-to-one. The two MINs together guarantee: <b>the prefix inserted into the host tree is identical per rank → trees are consistent</b>.'],
+    ['两个 MIN 同步点（组1 命中数、组2 完成数）+ 每 batch 恒定 1 个 ack，共同保证 PP 各 rank 的 host radix tree 严格一致。',
+     'Two MIN sync points (set 1 = hit count, set 2 = completed tokens) + exactly one ack per batch together keep every PP rank\u2019s host radix tree strictly identical.'],
+    // tab4 animated step captions
+    ['沿数据流向下逐步点亮：两个 MIN 同步点 + 每 batch 恒定 1 个 ack。',
+     'Light up step by step along the data flow: two MIN sync points + exactly one ack per batch.'],
+    ['调度器主线程把 prefetch 请求（writeback / load）放入队列，触发后台流水线。',
+     'The scheduler main thread enqueues a prefetch request (writeback / load), kicking off the background pipeline.'],
+    ['<b>PrefetchOperation</b> 入队 <code>prefetch_queue</code>，交给后台线程处理。',
+     'A <b>PrefetchOperation</b> enters <code>prefetch_queue</code>, handed to the background threads.'],
+    ['① <b>prefetch_thread</b> 调 <code>_storage_hit_query()</code> 查询 L3 命中页数（各 rank 可能不同）。',
+     '① <b>prefetch_thread</b> calls <code>_storage_hit_query()</code> to query L3 hit pages (may differ per rank).'],
+    ['◆ <b style="color:var(--amber)">第一个 MIN</b>：在 <code>prefetch_hits_sync_groups</code>（组1）对 storage_hit_count 取最小 → <b>抓取范围逐 rank 一致</b>。',
+     '◆ <b style="color:var(--amber)">First MIN</b>: take the min of storage_hit_count on <code>prefetch_hits_sync_groups</code> (set 1) → <b>the fetch range is identical per rank</b>.'],
+    ['命中足够的请求落入 <code>prefetch_buffer</code>，进入实际 IO 加载。',
+     'Requests with enough hits drop into <code>prefetch_buffer</code> for the actual IO load.'],
+    ['② <b>prefetch_io_aux_thread</b> 用 <code>_page_transfer()</code> 逐 batch 把页 L3→host；<b>每个 batch 恒产生 1 个 PrefetchAck</b>（出错也产生）。',
+     '② <b>prefetch_io_aux_thread</b> uses <code>_page_transfer()</code> to move pages L3→host batch by batch; <b>each batch always emits exactly one PrefetchAck</b> (even on error).'],
+    ['每个 batch 的 <b>PrefetchAck</b> 入队 <code>prefetch_sync_queue</code>。',
+     'Each batch\u2019s <b>PrefetchAck</b> enters <code>prefetch_sync_queue</code>.'],
+    ['③ <b>prefetch_sync_thread</b> 对每个 ack 的 <b>completed_tokens</b> 做归约。',
+     '③ <b>prefetch_sync_thread</b> reduces the <b>completed_tokens</b> of every ack.'],
+    ['◆ <b style="color:var(--green)">第二个 MIN</b>：在 <code>prefetch_completion_sync_groups</code>（组2）对 completed_tokens 取最小 → <b>真正落盘前缀逐 rank 一致</b>。',
+     '◆ <b style="color:var(--green)">Second MIN</b>: take the min of completed_tokens on <code>prefetch_completion_sync_groups</code> (set 2) → <b>the actually-loaded prefix is identical per rank</b>.'],
+    ['统一后的结果入队 <code>ack_prefetch_queue</code> 回到调度器。',
+     'The unified result enters <code>ack_prefetch_queue</code> back to the scheduler.'],
+    ['调度器只插入 <b>completed_tokens</b> 长度的前缀 → <code>_insert_helper_host()</code>。每 batch 恒 1 个 ack，<b>reduce 次数严格相等 → 不会 hang</b>。',
+     'The scheduler inserts only the <b>completed_tokens</b>-long prefix → <code>_insert_helper_host()</code>. One ack per batch means <b>equal reduce counts → no hang</b>.'],
+    ['✅ 闭环：两个 MIN（组1 命中数 + 组2 完成数）+ 每 batch 1 个 ack → <b style="color:var(--green)">PP 各 rank 的 host radix tree 严格一致</b>。',
+     '✅ Closed loop: two MINs (set 1 hit count + set 2 completed tokens) + one ack per batch → <b style="color:var(--green)">every PP rank\u2019s host radix tree is strictly identical</b>.'],
+  ];
+
+  const SEL = ['header h1','header p','.tab','.tree-title','.caption','.banner','.legend .chip',
+    '#scene2 .note','.grp','.t3-title','.clabel','.flow-note','.lane-hint','.ctl',
+    '.pp-label','.pp-foot .lab','.former h5','.former .hitbox .ht1','.former .hitbox .ht2','.former .chk',
+    '.tbox .tname','.tbox .tdesc','.tarrow','.minnode','.whycard h4','.whycard p',
+    '.gpu-badge span','.l3lab','.syncbadge','.consist-flag','#scene6 .note','.barlabel'].join(',');
+
+  const tmp=document.createElement('div');
+  const strip=h=>{ tmp.innerHTML=h; return tmp.textContent.replace(/\s+/g,' ').trim(); };
+  const EN={}, ZH={};
+  PAIRS.forEach(([zh,en])=>{ EN[strip(zh)]=en; ZH[strip(en)]=zh; });
+
+  let LANG='zh';
+  // runtime helper for dynamic strings that contain interpolated values
+  // (cannot be matched by the static dictionary). Reads the live LANG.
+  window.TR=(zh,en)=> LANG==='en' ? en : zh;
+  let mo=null;
+  let suppress=false;   // re-entrancy guard: ignore mutations we cause ourselves
+  function translateEl(el){
+    const k=strip(el.innerHTML);
+    const next = LANG==='en' ? EN[k] : ZH[k];
+    // only write when there is a real change, otherwise we churn the DOM
+    if(next!==undefined && next!==el.innerHTML) el.innerHTML=next;
+  }
+  function translateAll(){
+    suppress=true;
+    document.querySelectorAll(SEL).forEach(translateEl);
+    if(mo) mo.takeRecords();   // drop the records our own writes just generated
+    suppress=false;
+  }
+
+  window.toggleLang=function(){
+    LANG = LANG==='zh' ? 'en' : 'zh';
+    document.getElementById('langBtn').textContent = LANG==='zh' ? 'EN' : '中文';
+    document.documentElement.lang = LANG==='zh' ? 'zh-CN' : 'en';
+    translateAll();
+  };
+
+  // keep dynamic captions translated as JS rewrites them
+  mo=new MutationObserver(muts=>{
+    if(suppress) return;       // skip the mutations our own translations produced
+    suppress=true;
+    muts.forEach(m=>{
+      const tgt = m.target.nodeType===1 ? m.target : m.target.parentElement;
+      if(!tgt) return;
+      const c = tgt.closest && tgt.closest(SEL);
+      if(c) translateEl(c);
+    });
+    mo.takeRecords();
+    suppress=false;
+  });
+  mo.observe(document.body,{subtree:true,childList:true,characterData:true});
+})();
+</script>
+</body>
+</html>
+<style>#langBtn,header,.tabs{display:none!important;}body{background:#0e1117;}.wrap{padding-top:10px;}</style>
+<script>
+(function(){
+  // English-only, single-tab embed: reuse all original JS
+  try{ if(window.toggleLang) toggleLang(); }catch(e){}   // zh -> en
+  var TAB="threads";
+  var btn=document.querySelector('.tab[data-tab="'+TAB+'"]');
+  if(btn){ btn.click(); }
+})();
+</script>
diff --git a/public/images/blog/pp_hicache_consistency/l3_prefetch_problem.png b/public/images/blog/pp_hicache_consistency/l3_prefetch_problem.png
new file mode 100644
index 000000000..9c59ee898
Binary files /dev/null and b/public/images/blog/pp_hicache_consistency/l3_prefetch_problem.png differ
diff --git a/public/images/blog/pp_hicache_consistency/lifecycle.gif b/public/images/blog/pp_hicache_consistency/lifecycle.gif
new file mode 100644
index 000000000..d4c17f15c
Binary files /dev/null and b/public/images/blog/pp_hicache_consistency/lifecycle.gif differ
diff --git a/public/images/blog/pp_hicache_consistency/main_sync_flow.svg b/public/images/blog/pp_hicache_consistency/main_sync_flow.svg
new file mode 100644
index 000000000..e5d32061c
--- /dev/null
+++ b/public/images/blog/pp_hicache_consistency/main_sync_flow.svg
@@ -0,0 +1,47 @@
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 860 320" font-family="-apple-system,Segoe UI,Helvetica,Arial,sans-serif">
+  <defs>
+    <marker id="am" markerWidth="10" markerHeight="10" refX="7" refY="3" orient="auto">
+      <path d="M0,0 L7,3 L0,6 Z" fill="#6b7280"/>
+    </marker>
+  </defs>
+  <rect x="0" y="0" width="860" height="320" fill="#ffffff"/>
+  <text x="430" y="28" text-anchor="middle" font-size="15" font-weight="700" fill="#111827">main-branch prefetch sync — and where it falls short under PP + L3</text>
+
+  <!-- scheduler main -->
+  <rect x="30" y="56" width="200" height="58" rx="10" fill="#f3f4f6" stroke="#9ca3af" stroke-width="1.5"/>
+  <text x="130" y="80" text-anchor="middle" font-size="13" font-weight="700" fill="#374151">Scheduler (main thread)</text>
+  <text x="130" y="99" text-anchor="middle" font-size="11" fill="#6b7280">enqueue prefetch op</text>
+
+  <line x1="232" y1="85" x2="282" y2="85" stroke="#6b7280" stroke-width="1.5" marker-end="url(#am)"/>
+
+  <!-- prefetch_thread -->
+  <rect x="286" y="48" width="280" height="74" rx="10" fill="#f5f3ff" stroke="#7c3aed" stroke-width="1.5"/>
+  <text x="426" y="71" text-anchor="middle" font-size="13" font-weight="700" fill="#5b21b6">prefetch_thread (background)</text>
+  <text x="426" y="90" text-anchor="middle" font-size="11" fill="#4b5563">all_reduce(MIN) storage_hit_count</text>
+  <text x="426" y="108" text-anchor="middle" font-size="11" font-weight="700" fill="#dc2626">over TP / CP ring only — PP not covered</text>
+
+  <line x1="426" y1="122" x2="426" y2="150" stroke="#6b7280" stroke-width="1.5" marker-end="url(#am)"/>
+
+  <!-- IO load updates shared op -->
+  <rect x="286" y="152" width="280" height="66" rx="10" fill="#eff6ff" stroke="#2563eb" stroke-width="1.5"/>
+  <text x="426" y="175" text-anchor="middle" font-size="12.5" font-weight="700" fill="#1d4ed8">IO load pages L3 → host</text>
+  <text x="426" y="194" text-anchor="middle" font-size="11" fill="#4b5563">updates completed_tokens in place</text>
+  <text x="426" y="210" text-anchor="middle" font-size="11" fill="#4b5563">on the shared operation object</text>
+
+  <!-- main thread polls -->
+  <line x1="426" y1="218" x2="426" y2="246" stroke="#6b7280" stroke-width="1.5" marker-end="url(#am)"/>
+  <rect x="246" y="248" width="360" height="56" rx="10" fill="#fffbeb" stroke="#d97706" stroke-width="1.5"/>
+  <text x="426" y="271" text-anchor="middle" font-size="12.5" font-weight="700" fill="#b45309">main thread: check_prefetch_progress(req) polling</text>
+  <text x="426" y="290" text-anchor="middle" font-size="11" font-weight="700" fill="#dc2626">per-request, no alignment across PP ranks</text>
+
+  <!-- right: consequence (elbow connector into the red box's left-middle) -->
+  <path d="M606,276 H625 V213 H640" fill="none" stroke="#6b7280" stroke-width="1.5" marker-end="url(#am)"/>
+  <rect x="644" y="150" width="196" height="126" rx="10" fill="#fef2f2" stroke="#dc2626" stroke-width="1.5"/>
+  <text x="742" y="178" text-anchor="middle" font-size="12.5" font-weight="700" fill="#b91c1c">Result under PP + L3</text>
+  <text x="742" y="202" text-anchor="middle" font-size="11" fill="#4b5563">two divergent quantities</text>
+  <text x="742" y="219" text-anchor="middle" font-size="11" fill="#4b5563">not unified across PP</text>
+  <text x="742" y="243" text-anchor="middle" font-size="11" fill="#4b5563">host trees drift →</text>
+  <text x="742" y="260" text-anchor="middle" font-size="11" font-weight="700" fill="#b91c1c">shape-mismatch crash</text>
+
+  <text x="426" y="40" text-anchor="middle" font-size="11" fill="#6b7280">no PrefetchAck · no prefetch_sync_queue · no background sync thread</text>
+</svg>
diff --git a/public/images/blog/pp_hicache_consistency/preview.png b/public/images/blog/pp_hicache_consistency/preview.png
new file mode 100644
index 000000000..31430c44a
Binary files /dev/null and b/public/images/blog/pp_hicache_consistency/preview.png differ
diff --git a/public/images/blog/pp_hicache_consistency/skew.gif b/public/images/blog/pp_hicache_consistency/skew.gif
new file mode 100644
index 000000000..ad860dd06
Binary files /dev/null and b/public/images/blog/pp_hicache_consistency/skew.gif differ
diff --git a/public/images/blog/pp_hicache_consistency/threads.gif b/public/images/blog/pp_hicache_consistency/threads.gif
new file mode 100644
index 000000000..009748f32
Binary files /dev/null and b/public/images/blog/pp_hicache_consistency/threads.gif differ