feat: add table-size based cache pass-through by shefeek-jinnah · Pull Request #1 · hotdata-dev/liquid-cache

shefeek-jinnah · 2026-04-21T19:15:29Z

Summary

Scans whose total parquet file size exceeds a configurable threshold bypass LiquidCache and use DataFusion's vanilla parquet reader instead. This prevents large cold scans from polluting the RAM cache and
growing the on-disk spill file unboundedly.

Problem

In mixed workloads with small hot tables and large cold tables, caching everything causes the disk tier to expand significantly (decoded Arrow batches are ~5x larger than compressed parquet). This creates
memory pressure and degrades cache performance for the smaller datasets that benefit most from caching.

Solution

Add with_max_scan_bytes(u64) to LocalModeOptimizer and LiquidCacheLocalBuilder. When set, the optimizer computes the total file size of a parquet scan and skips LiquidCache wrapping if it exceeds the
threshold. The scan falls through to DataFusion's vanilla parquet reader.

Default behavior is unchanged — no threshold means cache everything.

Usage

let (ctx, _) = LiquidCacheLocalBuilder::new()
    .with_max_cache_bytes(8 * 1024 * 1024 * 1024)
    .with_max_scan_bytes(1024 * 1024 * 1024) // skip scans > 1 GB
    .build(session_config)                  
    .await?;

Ref: XiangpengHao#485

claude

Nit-level feedback only — no blocking issues.

claude · 2026-04-21T19:23:47Z

+        let eager = self.eager_shredding;
+        let rewritten = plan
+            .transform_up(|node| try_optimize_parquet_source(node, cache, eager, max_scan_bytes))
+            .unwrap();


nit: optimize returns Result, so this should propagate the error instead of panicking. (not blocking)

Suggested change

.unwrap();

.unwrap();

should be:

Suggested change

.unwrap();

?;

claude · 2026-04-21T19:23:49Z

+                .flat_map(|g| g.files())
+                .map(|f| f.object_meta.size)
+                .sum();
+            if total > max_bytes {


nit: the size-based bypass has no test coverage. A small test that registers a parquet file, sets max_scan_bytes to 0 (or 1), and verifies the resulting plan still uses ParquetSource (not LiquidParquetSource) would make this easy to verify and protect against regression. (not blocking)

sentry · 2026-04-21T19:27:50Z

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

claude · 2026-04-24T09:24:03Z

-                  RUSTDOCFLAGS: -D warnings
-
            - name: Run clippy
              run: cargo clippy --all-targets --all-features -- -D warnings


Blocking: The CI no longer runs cargo test. After this diff the pipeline only checks formatting and clippy — no tests execute. This means regressions in existing behaviour, including the new size-bypass logic, are invisible to CI.

Add a test step (coverage reporting can be dropped if CODECOV_TOKEN is unavailable in this fork):

Suggested change

run: cargo clippy --all-targets --all-features -- -D warnings

run: cargo clippy --all-targets --all-features -- -D warnings

- name: Run tests

run: cargo test --all-features

claude

Review

Blocking Issues

.github/workflows/ci.yml line 30 — The CI pipeline no longer runs cargo test. After this PR the only checks are formatting and clippy; the test suite never executes. This means correctness regressions (including in the new size-bypass path) go undetected. A cargo test --all-features step must be added.

Action Required

Add cargo test --all-features (or restore a dedicated test job) to the check job before merge. Coverage upload can be dropped if CODECOV_TOKEN is unavailable in this fork, but the test run itself must remain.

claude · 2026-04-27T05:42:16Z

-              uses: model-checking/kani-github-action@v1.1
-              with:
-                  working-directory: src/core
+              run: cargo clippy --workspace --exclude dev-tools --all-targets --all-features -- -D warnings


Blocking (unresolved from previous review): The pipeline still runs no tests. Format and clippy check correctness of syntax and style, not behaviour — any regression in the new size-bypass logic (or anything else) is invisible to CI.

Add a test step:

Suggested change

run: cargo clippy --workspace --exclude dev-tools --all-targets --all-features -- -D warnings

run: cargo clippy --all-targets --all-features -- -D warnings

- name: Run tests

run: cargo test --all-features

claude · 2026-04-27T05:42:18Z

+            .transform_up(|node| try_optimize_parquet_source(node, cache, eager, max_scan_bytes))
+            .unwrap();
+        Ok(rewritten.data)
    }


nit: optimize returns Result, so a panic here is a regression over the old code (which delegated to rewrite_data_source_plan and surfaced errors via Ok(...)). transform_up returns Result; propagate it instead of unwrapping. (not blocking)

Suggested change

}

?;

claude

Review

Blocking Issues

.github/workflows/ci.yml line 30: No tests run in CI. The pipeline only checks formatting and clippy — cargo test was not added back after the previous review flagged this. Any regression in the new size-bypass logic is invisible.

Action Required

Add cargo test --all-features as a step in the check job (or a dedicated job) before this can merge.

shefeek-jinnah added 2 commits April 22, 2026 00:33

feat: add table-size based cache pass-through

687727d

Fixing formatting issue

0fa8f70

shefeek-jinnah requested a review from anoop-narang April 21, 2026 19:21

claude Bot previously approved these changes Apr 21, 2026

View reviewed changes

claude Bot reviewed Apr 21, 2026

View reviewed changes

Adding some logs and disabling ci tests

8000225

shefeek-jinnah dismissed claude[bot]’s stale review via 8000225 April 23, 2026 14:46

claude Bot previously approved these changes Apr 23, 2026

View reviewed changes

Cleaning up CI

a6486fc

shefeek-jinnah dismissed claude[bot]’s stale review via a6486fc April 24, 2026 09:20

claude Bot reviewed Apr 24, 2026

View reviewed changes

claude Bot requested changes Apr 24, 2026

View reviewed changes

Fixing clippy issue

7d9eb11

claude Bot reviewed Apr 27, 2026

View reviewed changes

claude Bot requested changes Apr 27, 2026

View reviewed changes

Adding back basic test step

c7458f9

claude Bot approved these changes Apr 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add table-size based cache pass-through#1

feat: add table-size based cache pass-through#1
shefeek-jinnah wants to merge 6 commits intomainfrom
dev

shefeek-jinnah commented Apr 21, 2026 •

edited

Loading

Uh oh!

claude Bot left a comment

Uh oh!

claude Bot Apr 21, 2026

Uh oh!

claude Bot Apr 21, 2026

Uh oh!

sentry Bot commented Apr 21, 2026

Uh oh!

claude Bot Apr 24, 2026

Uh oh!

claude Bot left a comment

Uh oh!

claude Bot Apr 27, 2026

Uh oh!

claude Bot Apr 27, 2026

Uh oh!

claude Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shefeek-jinnah commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Usage

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

sentry Bot commented Apr 21, 2026

Welcome to Codecov 🎉

Uh oh!

claude Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Review

Blocking Issues

Action Required

Uh oh!

claude Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Review

Blocking Issues

Action Required

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shefeek-jinnah commented Apr 21, 2026 •

edited

Loading