[CALCITE-7618] Add filter pushdown support to the file adapter's CSV table implementation#5048
[CALCITE-7618] Add filter pushdown support to the file adapter's CSV table implementation#5048Diveyam-Mishra wants to merge 1 commit into
Conversation
66fb2ac to
e0535c1
Compare
|
|
||
| protected CsvTableScan(RelOptCluster cluster, RelOptTable table, | ||
| CsvTranslatableTable csvTable, int[] fields, | ||
| @Nullable String @Nullable [] filterValues) { |
There was a problem hiding this comment.
I think CsvEnumerator is actually broken, since it does string comparisons.
This means for example that 0.0 != 0 in a filter.
f60ce88 to
d5d601a
Compare
…table implementation
d5d601a to
3fd66a7
Compare
|
|
I might have complicated a few things because I was getting some Style errors constantly on local which i tried to fix but idk maybe was doing something wrong i tried stopping daemon thread and rebuild yet something went haywire So If its needed i can open a new PR with single proper commit |
|
Please use fresh commits until we finish the review, to make it easier to see what changed in response to reviewers. |
| if (o1 == null || o2 == null) { | ||
| return false; | ||
| } | ||
| if (o1 instanceof BigDecimal && o2 instanceof BigDecimal) { |
There was a problem hiding this comment.
Why is this case needed? Doesn't BigDecimal have equals?
If it does, can this become Objects.equals()?
| * {@link CsvTableScan}. | ||
| * | ||
| * <p>Only equality conditions of the form {@code column = literal} can be | ||
| * pushed down, because {@link CsvEnumerator} only supports per-column |
There was a problem hiding this comment.
Could this situation be improved? Is this a fundamental limitation of CsvEnumerator?
Maybe we need a more powerful enumerator.
In principle I think any predicate of the current row value should work.
| sql("model-with-custom-table", sql).ok(); | ||
| } | ||
|
|
||
| /** Test case for |
There was a problem hiding this comment.
It would be nice to higher a higher coverage in terms of SQL types for columns.



Jira Link
[CALCITE-7618]
Changes Proposed
This PR implements filter pushdown support for the file adapter's CSV table using a planner-rule-based approach instead of a
FilterableTableinterface. This allows Calcite to make more intelligent planning decisions, estimate cost reductions, and display pushed-down predicates inEXPLAINplans.Implementation Details:
CsvFilterTableScanRulewhich matchesLogicalFilteron aCsvTableScanand pushes simple equality predicates (col = literal) into the scan.CsvProjectFilterTableScanRulewhich matchesLogicalProject→LogicalFilter→CsvTableScanand pushes down the filter first, preventing the planner from prematurely collapsing projects and filters into a genericEnumerableCalcand bypassing pushdown.CsvTableScanto store and propagate@Nullable String[] filterValues.CsvTableScan#computeSelfCostto reduce planning cost proportionally to the number of pushed-down filters.CsvTableScan#explainTermsto format filters asfilters=[[colIndex=value]]inEXPLAINoutputs.CsvTranslatableTable#scan(DataContext, int[], String[])which is dynamically invoked by the generated code when filters are present.CsvEnumerator#converterpackage-private so it can be reused insideCsvTranslatableTableto resolve correct row converters (ensuring single-column projections return raw objects rather thanObject[]arrays to prevent class cast errors).FileAdapterTest.javaverifying pushdown, projection combination, result correctness, and non-pushable residual filter persistence.testPushDownProjectAggregateWithFilterto reflect the newly optimized scan plans.To verify the change, run:
.\sqlline.bat -u "jdbc:calcite:model=file/src/test/resources/smart.json" -n admin -p admin -e "!set maxwidth 10000" -e "explain plan for select name, empno from EMPS where deptno = 20"
Before this change, the plan was:
PLAN=EnumerableCalc(expr#0..2=[{inputs}], expr#3=[20], expr#4=[=($t2, $t3)], NAME=[$t1], EMPNO=[$t0], $condition=[$t4])
CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2]])
After this change, the filter and projection are pushed down into CsvTableScan, resulting in:
CsvTableScan(table=[[SALES, EMPS]], fields=[[1, 0]], filters=[[2=20]])
This demonstrates that the scan now reads only the required columns (name, empno) and applies the deptno = 20 filter during the table scan itself.