Skip to content

hooks/pyramid_attention_broadcast: remove redundant iteration==0 guard and fix stale cache VRAM leak#13497

Open
GitGlimpse895 wants to merge 1 commit intohuggingface:mainfrom
GitGlimpse895:fix/pab-cache-logic
Open

hooks/pyramid_attention_broadcast: remove redundant iteration==0 guard and fix stale cache VRAM leak#13497
GitGlimpse895 wants to merge 1 commit intohuggingface:mainfrom
GitGlimpse895:fix/pab-cache-logic

Conversation

@GitGlimpse895
Copy link
Copy Markdown

What does this PR do?

Fixes two bugs in PyramidAttentionBroadcastHook.new_forward:

1. Redundant iteration == 0 condition
After every reset_state(), both iteration and cache are reset to 0 and None respectively. So at iteration 0, self.state.cache is None is always True, making self.state.iteration == 0 permanently dead code that creates a misleading impression of two independent invariants.

2. Stale cache leaking GPU VRAM
When outside the active timestep range, the hook was unconditionally writing self.state.cache = output, holding a full hidden-state activation tensor on GPU until the next generation's reset_state() call. For video transformers with dozens of PAB-hooked layers, this accumulates hundreds of MBs of unreleased VRAM. The fix sets self.state.cache = None immediately when outside the range.

This is a re-submission of #13467 (accidentally self-closed). Apologies for the noise, @sayakpaul!

Checklist

  • Did you read the contributor guideline?
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

@sayakpaul @DN6

@github-actions github-actions bot added size/S PR with diff < 50 LOC and removed size/S PR with diff < 50 LOC labels Apr 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hooks size/S PR with diff < 50 LOC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant