[CALCITE-7618] Add filter pushdown support to the file adapter's CSV table implementation by Diveyam-Mishra · Pull Request #5048 · apache/calcite

Diveyam-Mishra · 2026-06-24T21:23:03Z

Jira Link

Changes Proposed

This PR implements filter pushdown support for the file adapter's CSV table using a planner-rule-based approach instead of a FilterableTable interface. This allows Calcite to make more intelligent planning decisions, estimate cost reductions, and display pushed-down predicates in EXPLAIN plans.

Implementation Details:

Rule-Based Pushdown:
- Introduced CsvFilterTableScanRule which matches LogicalFilter on a CsvTableScan and pushes simple equality predicates (col = literal) into the scan.
- Introduced CsvProjectFilterTableScanRule which matches LogicalProject → LogicalFilter → CsvTableScan and pushes down the filter first, preventing the planner from prematurely collapsing projects and filters into a generic EnumerableCalc and bypassing pushdown.
Scan State & Costing:
- Updated CsvTableScan to store and propagate @Nullable String[] filterValues.
- Updated CsvTableScan#computeSelfCost to reduce planning cost proportionally to the number of pushed-down filters.
- Extended CsvTableScan#explainTerms to format filters as filters=[[colIndex=value]] in EXPLAIN outputs.
Execution Support:
- Added CsvTranslatableTable#scan(DataContext, int[], String[]) which is dynamically invoked by the generated code when filters are present.
- Made CsvEnumerator#converter package-private so it can be reused inside CsvTranslatableTable to resolve correct row converters (ensuring single-column projections return raw objects rather than Object[] arrays to prevent class cast errors).
Testing:
- Added target unit tests in FileAdapterTest.java verifying pushdown, projection combination, result correctness, and non-pushable residual filter persistence.
- Updated existing plans in testPushDownProjectAggregateWithFilter to reflect the newly optimized scan plans.

To verify the change, run:

.\sqlline.bat -u "jdbc:calcite:model=file/src/test/resources/smart.json" -n admin -p admin -e "!set maxwidth 10000" -e "explain plan for select name, empno from EMPS where deptno = 20"

Before this change, the plan was:

PLAN=EnumerableCalc(expr#0..2=[{inputs}], expr#3=[20], expr#4=[=($t2, $t3)], NAME=[$t1], EMPNO=[$t0], $condition=[$t4])
CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2]])

After this change, the filter and projection are pushed down into CsvTableScan, resulting in:

CsvTableScan(table=[[SALES, EMPS]], fields=[[1, 0]], filters=[[2=20]])

This demonstrates that the scan now reads only the required columns (name, empno) and applies the deptno = 20 filter during the table scan itself.

mihaibudiu · 2026-06-24T22:04:29Z

+
+  protected CsvTableScan(RelOptCluster cluster, RelOptTable table,
+      CsvTranslatableTable csvTable, int[] fields,
+      @Nullable String @Nullable [] filterValues) {


I think CsvEnumerator is actually broken, since it does string comparisons.
This means for example that 0.0 != 0 in a filter.

…table implementation

sonarqubecloud · 2026-06-27T18:59:06Z

Quality Gate passed

Issues
4 New issues
0 Accepted issues

Measures
0 Security Hotspots
88.3% Coverage on New Code
41.5% Duplication on New Code

See analysis details on SonarQube Cloud

Diveyam-Mishra · 2026-06-27T19:27:10Z

I might have complicated a few things because I was getting some Style errors constantly on local which i tried to fix but idk maybe was doing something wrong i tried stopping daemon thread and rebuild yet something went haywire So If its needed i can open a new PR with single proper commit

mihaibudiu · 2026-06-27T22:08:39Z

Please use fresh commits until we finish the review, to make it easier to see what changed in response to reviewers.

mihaibudiu · 2026-06-27T22:33:18Z

+    if (o1 == null || o2 == null) {
+      return false;
+    }
+    if (o1 instanceof BigDecimal && o2 instanceof BigDecimal) {


Why is this case needed? Doesn't BigDecimal have equals?
If it does, can this become Objects.equals()?

mihaibudiu · 2026-06-27T22:36:14Z

+ * {@link CsvTableScan}.
+ *
+ * <p>Only equality conditions of the form {@code column = literal} can be
+ * pushed down, because {@link CsvEnumerator} only supports per-column


Could this situation be improved? Is this a fundamental limitation of CsvEnumerator?
Maybe we need a more powerful enumerator.
In principle I think any predicate of the current row value should work.

mihaibudiu · 2026-06-27T22:38:34Z

    sql("model-with-custom-table", sql).ok();
  }

+  /** Test case for


It would be nice to higher a higher coverage in terms of SQL types for columns.

Diveyam-Mishra force-pushed the CALCITE-7618 branch 2 times, most recently from 66fb2ac to e0535c1 Compare June 24, 2026 21:37

mihaibudiu reviewed Jun 24, 2026

View reviewed changes

Diveyam-Mishra marked this pull request as draft June 24, 2026 22:30

Diveyam-Mishra force-pushed the CALCITE-7618 branch 3 times, most recently from f60ce88 to d5d601a Compare June 27, 2026 18:29

[CALCITE-7618] Add filter pushdown support to the file adapter's CSV …

3fd66a7

…table implementation

Diveyam-Mishra force-pushed the CALCITE-7618 branch from d5d601a to 3fd66a7 Compare June 27, 2026 18:43

Diveyam-Mishra marked this pull request as ready for review June 27, 2026 19:24

Diveyam-Mishra requested a review from mihaibudiu June 27, 2026 19:25

mihaibudiu reviewed Jun 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CALCITE-7618] Add filter pushdown support to the file adapter's CSV table implementation#5048

[CALCITE-7618] Add filter pushdown support to the file adapter's CSV table implementation#5048
Diveyam-Mishra wants to merge 1 commit into
apache:mainfrom
Diveyam-Mishra:CALCITE-7618

Diveyam-Mishra commented Jun 24, 2026

Uh oh!

mihaibudiu Jun 24, 2026

Uh oh!

sonarqubecloud Bot commented Jun 27, 2026

Uh oh!

Diveyam-Mishra commented Jun 27, 2026

Uh oh!

mihaibudiu commented Jun 27, 2026

Uh oh!

mihaibudiu Jun 27, 2026

Uh oh!

mihaibudiu Jun 27, 2026

Uh oh!

mihaibudiu Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Diveyam-Mishra commented Jun 24, 2026

Jira Link

Changes Proposed

Implementation Details:

Uh oh!

mihaibudiu Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented Jun 27, 2026

Quality Gate passed

Uh oh!

Diveyam-Mishra commented Jun 27, 2026

Uh oh!

mihaibudiu commented Jun 27, 2026

Uh oh!

mihaibudiu Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

mihaibudiu Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

mihaibudiu Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants