feat: behavior hashing for cache key invalidation#196
Open
Conversation
Contributor
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #196 +/- ##
==========================================
- Coverage 95.98% 95.20% -0.79%
==========================================
Files 140 141 +1
Lines 9797 10362 +565
Branches 568 601 +33
==========================================
+ Hits 9404 9865 +461
- Misses 275 369 +94
- Partials 118 128 +10 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
fa37bfb to
c9fe30e
Compare
Add compute_behavior_token() which produces a SHA-256 fingerprint of a class's method bytecode. Decorator chains (@Flow.call, etc.) are automatically unwrapped via inspect.unwrap so the hash reflects the user's implementation, not the wrapper. Key design: - Walks MRO with override semantics (subclass overrides parent) - Supports __ccflow_tokenizer_deps__ for extra standalone functions - Dependencies sorted by qualname (order-insensitive) - Cached per-class in __behavior_token_cache__ (not inherited) - Returns None for classes with no hashable methods Integration: cache_key() now includes behavior tokens for the model and any non-transparent evaluators, so code changes invalidate the cache without requiring a config change. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Pascal Tomecek <pascal.tomecek@cubistsystematic.com>
Add compute_data_token() as the single wrapper around dask tokenization and refactor cache_key() to combine precomputed data and behavior tokens instead of mutating one nested payload dict. This makes cache_key() mostly orchestration: - flatten the evaluation context chain - collect data/behavior tokens for the underlying model - collect data/behavior tokens for non-transparent evaluators - combine those tokens into one final cache key Also adds tests for compute_data_token() and opaque evaluator behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Pascal Tomecek <pascal.tomecek@cubistsystematic.com>
Update behavior hashing so function defaults, keyword-only defaults, and closure cell contents contribute to compute_behavior_token(). This closes a cache-key correctness gap where semantic changes could leave behavior tokens unchanged. Also merge __ccflow_tokenizer_deps__ across the full MRO instead of first-definition-wins, with deterministic deduping so subclasses can add deps without dropping inherited ones. Add regression tests for defaults, kwdefaults, closures, inherited deps, and a cache_key integration check for helper default changes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Pascal Tomecek <pascal.tomecek@cubistsystematic.com>
Add compute_cache_token() alongside compute_data_token() and compute_behavior_token(), refactor cache_key() to delegate to it, and rename the cached class attribute to __ccflow_tokenizer_cache__ so it matches __ccflow_tokenizer_deps__. This commit also keeps class support in __ccflow_tokenizer_deps__, including recursive class-dependency detection, and adds regression coverage for combined cache tokens and cache-key integration. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Pascal Tomecek <pascal.tomecek@cubistsystematic.com>
c9fe30e to
d1e6924
Compare
Add a private SHA-256 helper in ccflow.utils.tokenize so the hash algorithm is defined in one place, rename the tokenize tests to ccflow/tests/utils/test_tokenize.py to match the module name, and document how MemoryCacheEvaluator cache keys are built. The docs now describe how data tokens, behavior tokens, transparent vs non-transparent evaluators, and __ccflow_tokenizer_deps__ all feed into compute_cache_token(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Pascal Tomecek <pascal.tomecek@cubistsystematic.com>
timkpaine
approved these changes
Apr 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
compute_behavior_token()— a deterministic SHA-256 fingerprint of a class's method bytecode. When callable logic changes, cache keys automatically invalidate without requiring config changes.This is PR 1 of 3 splitting the tokenization work from #195:
What's included
compute_behavior_token(cls)co_code+co_consts(minus docstrings) for each method@Flow.call,functools.wraps, etc.) viainspect.unwrap__ccflow_tokenizer_deps__for declaring extra standalone function dependencies__behavior_token_cache__(not inherited by subclasses)Nonefor classes with no hashable methodscache_key()integrationNone(backward-compatible — no change for classes without methods)What's NOT included (future PRs)
normalize_tokenmodel_tokenproperty on BaseModelTests
26 new tests covering:
__ccflow_tokenizer_deps__ordering and changes@Flow.calldecorator unwrappingcache_key()integration with CallableModelAll 672 existing tests pass (2 skipped).