Skip to content

[CALCITE-7618] Add filter pushdown support to the file adapter's CSV table implementation#5048

Open
Diveyam-Mishra wants to merge 1 commit into
apache:mainfrom
Diveyam-Mishra:CALCITE-7618
Open

[CALCITE-7618] Add filter pushdown support to the file adapter's CSV table implementation#5048
Diveyam-Mishra wants to merge 1 commit into
apache:mainfrom
Diveyam-Mishra:CALCITE-7618

Conversation

@Diveyam-Mishra

Copy link
Copy Markdown
Contributor

Jira Link

[CALCITE-7618]

Changes Proposed

This PR implements filter pushdown support for the file adapter's CSV table using a planner-rule-based approach instead of a FilterableTable interface. This allows Calcite to make more intelligent planning decisions, estimate cost reductions, and display pushed-down predicates in EXPLAIN plans.

Implementation Details:

  1. Rule-Based Pushdown:
    • Introduced CsvFilterTableScanRule which matches LogicalFilter on a CsvTableScan and pushes simple equality predicates (col = literal) into the scan.
    • Introduced CsvProjectFilterTableScanRule which matches LogicalProjectLogicalFilterCsvTableScan and pushes down the filter first, preventing the planner from prematurely collapsing projects and filters into a generic EnumerableCalc and bypassing pushdown.
  2. Scan State & Costing:
    • Updated CsvTableScan to store and propagate @Nullable String[] filterValues.
    • Updated CsvTableScan#computeSelfCost to reduce planning cost proportionally to the number of pushed-down filters.
    • Extended CsvTableScan#explainTerms to format filters as filters=[[colIndex=value]] in EXPLAIN outputs.
  3. Execution Support:
    • Added CsvTranslatableTable#scan(DataContext, int[], String[]) which is dynamically invoked by the generated code when filters are present.
    • Made CsvEnumerator#converter package-private so it can be reused inside CsvTranslatableTable to resolve correct row converters (ensuring single-column projections return raw objects rather than Object[] arrays to prevent class cast errors).
  4. Testing:
    • Added target unit tests in FileAdapterTest.java verifying pushdown, projection combination, result correctness, and non-pushable residual filter persistence.
    • Updated existing plans in testPushDownProjectAggregateWithFilter to reflect the newly optimized scan plans.

To verify the change, run:

.\sqlline.bat -u "jdbc:calcite:model=file/src/test/resources/smart.json" -n admin -p admin -e "!set maxwidth 10000" -e "explain plan for select name, empno from EMPS where deptno = 20"

Before this change, the plan was:

PLAN=EnumerableCalc(expr#0..2=[{inputs}], expr#3=[20], expr#4=[=($t2, $t3)], NAME=[$t1], EMPNO=[$t0], $condition=[$t4])
CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2]])

After this change, the filter and projection are pushed down into CsvTableScan, resulting in:

CsvTableScan(table=[[SALES, EMPS]], fields=[[1, 0]], filters=[[2=20]])

This demonstrates that the scan now reads only the required columns (name, empno) and applies the deptno = 20 filter during the table scan itself.

@Diveyam-Mishra Diveyam-Mishra force-pushed the CALCITE-7618 branch 2 times, most recently from 66fb2ac to e0535c1 Compare June 24, 2026 21:37

protected CsvTableScan(RelOptCluster cluster, RelOptTable table,
CsvTranslatableTable csvTable, int[] fields,
@Nullable String @Nullable [] filterValues) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think CsvEnumerator is actually broken, since it does string comparisons.
This means for example that 0.0 != 0 in a filter.

@Diveyam-Mishra Diveyam-Mishra marked this pull request as draft June 24, 2026 22:30
@Diveyam-Mishra Diveyam-Mishra force-pushed the CALCITE-7618 branch 3 times, most recently from f60ce88 to d5d601a Compare June 27, 2026 18:29
@sonarqubecloud

Copy link
Copy Markdown

@Diveyam-Mishra Diveyam-Mishra marked this pull request as ready for review June 27, 2026 19:24
@Diveyam-Mishra

Copy link
Copy Markdown
Contributor Author

I might have complicated a few things because I was getting some Style errors constantly on local which i tried to fix but idk maybe was doing something wrong i tried stopping daemon thread and rebuild yet something went haywire So If its needed i can open a new PR with single proper commit

@mihaibudiu

Copy link
Copy Markdown
Contributor

Please use fresh commits until we finish the review, to make it easier to see what changed in response to reviewers.

if (o1 == null || o2 == null) {
return false;
}
if (o1 instanceof BigDecimal && o2 instanceof BigDecimal) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this case needed? Doesn't BigDecimal have equals?
If it does, can this become Objects.equals()?

* {@link CsvTableScan}.
*
* <p>Only equality conditions of the form {@code column = literal} can be
* pushed down, because {@link CsvEnumerator} only supports per-column

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this situation be improved? Is this a fundamental limitation of CsvEnumerator?
Maybe we need a more powerful enumerator.
In principle I think any predicate of the current row value should work.

sql("model-with-custom-table", sql).ok();
}

/** Test case for

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to higher a higher coverage in terms of SQL types for columns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants