Skip to content

perf: replace CometBatchIterator FFI input path with the Arrow C Stream Interface#4572

Draft
mbutrovich wants to merge 62 commits into
apache:mainfrom
mbutrovich:arrow-stream-reader
Draft

perf: replace CometBatchIterator FFI input path with the Arrow C Stream Interface#4572
mbutrovich wants to merge 62 commits into
apache:mainfrom
mbutrovich:arrow-stream-reader

Conversation

@mbutrovich
Copy link
Copy Markdown
Contributor

@mbutrovich mbutrovich commented Jun 2, 2026

Which issue does this PR close?

Closes #3770.

Note: peeled off the draft experimental PR #4393 (which is not intended to merge), and for now also carries the commits from #4507 (native shuffle optimizations), so the diff is temporarily inflated and will shrink once #4507 merges. The description below covers only this PR's own scope (the Arrow C Stream Interface input path), not the #4507 native shuffle work.

Rationale for this change

  • The JVM-to-native input path used a bespoke CometBatchIterator plus a per-batch FFI deep copy, guarded by an arrow_ffi_safe flag, because the JVM could reuse or mutate a batch's buffers after handing it off. Every batch crossing the boundary was copied.
  • The Arrow C Stream Interface is the canonical, zero-copy way to hand an Arrow stream across FFI with proper ownership transfer, so both the deep copy and the flag become unnecessary.

What changes are included in this PR?

  • JVM exports each per-partition Iterator[ColumnarBatch] as an org.apache.arrow.c.ArrowArrayStream (Data.exportArrayStream); native takes ownership via from_raw. CometBatchIterator.java and the arrow_ffi_safe proto field/plumbing are removed.
  • CometExecIterator / CometExecRDD now pass an Array[Object] of already-exported ArrowArrayStream (or CometShuffleBlockIterator) slots instead of CometBatchIterator.
  • New ArrowReader implementations bridging Spark data to Arrow: RowArrowReader (InternalRow), SparkColumnarArrowReader (non-Arrow Spark ColumnarBatch), ColumnarBatchArrowReader (Arrow-backed ColumnarBatch, with VSR ownership transfer).
  • New CometNativeArrowSource trait: an operator supplies one per-partition reader and gets both the JVM columnar path (doExecuteColumnar) and the native C Stream path (doExecuteAsArrowStream). Implemented by CometLocalTableScanExec and CometSparkToColumnarExec.
  • Native AlignedArrowStreamReader wraps arrow-rs's stream reader to align buffers per imported batch (the JVM exports 8-byte-aligned buffers, which trip arrow-rs's alignment assertion). This is a temporary workaround: upstream Call align_buffers() in from_ffi, remove redundant call from arrow-pyarrow arrow-rs#10030 fixes it and ships in arrow 59.0.0, after which this reader can be dropped. scan.rs drops the per-batch deep copy.
  • reconcileStreamSchema advertises the truthful first-batch Arrow schema (not the consumer's declared types) so native ScanExec's boundary cast fires; logs one deduped warning per type drift (e.g. width_bucket return-type drift).
  • Unrelated to the Arrow C Stream work but too entangled to peel off cleanly from feat: enable CometLocalTableScanExec by default #4393: CometLocalTableScanExec now mixes in DataTypeSupport and runs a schema-level fallback in convert, so a LocalTableScanExec whose schema carries a type with no ArrowWriter coverage (Spark 4.1 TimeType, intervals, etc.) falls back to Spark instead of failing at the boundary. NullType is allow-listed since ArrowWriter handles it.

How are these changes tested?

  • Existing suites exercise the input path end to end (CometExecSuite, CometShuffleSuite, ParquetReadSuite, the fuzz suites).
  • New CometArrowStreamSuite covering stream export and schema reconciliation, added to the Linux and macOS PR build workflows.
  • New CometExecSuite cases for CometLocalTableScanExec: a TimeType schema-level fallback check (Spark 4.1+), plus two Arrow-buffer leak checks (project consumer and collect_list) that fail via the allocator leak detector if the per-batch buffers leak.

mbutrovich and others added 30 commits May 21, 2026 16:44
…nonical Arrow C Stream Interface (JVM Data.exportArrayStream <-> native ArrowArrayStreamReader), eliminating the per-batch FFI deep copy and the arrow_ffi_safe flag.
@mbutrovich mbutrovich added this to the 0.17.0 milestone Jun 2, 2026
@mbutrovich mbutrovich self-assigned this Jun 2, 2026
@mbutrovich
Copy link
Copy Markdown
Contributor Author

mbutrovich commented Jun 5, 2026

Ran TPC-H SF1000 with Comet on Spark 3.5.8:

tpch_queries_compare tpch_allqueries
Query main (s) #4507 (s) #4572 (s)
TPCH-01 10.495 10.151 10.722
TPCH-02 23.455 22.177 21.629
TPCH-03 12.366 14.064 10.347
TPCH-04 14.067 13.360 11.927
TPCH-05 30.023 29.119 27.642
TPCH-06 1.128 0.710 0.942
TPCH-07 16.368 15.073 13.730
TPCH-08 36.566 35.680 33.435
TPCH-09 44.722 41.845 40.223
TPCH-10 29.848 29.106 26.074
TPCH-11 15.373 14.837 15.752
TPCH-12 7.919 7.486 6.638
TPCH-13 10.194 10.103 9.761
TPCH-14 2.631 2.672 2.389
TPCH-15 11.613 12.524 12.202
TPCH-16 11.207 9.778 8.745
TPCH-17 31.002 28.755 29.156
TPCH-18 62.856 56.601 56.502
TPCH-19 9.253 9.319 8.556
TPCH-20 9.015 8.610 7.722
TPCH-21 74.530 69.337 71.159
TPCH-22 10.640 9.574 9.813
Total 475.272 450.879 435.066

Looks like it further improves on #4507.

@mbutrovich
Copy link
Copy Markdown
Contributor Author

TPC-DS SF 1000. Some queries are faster, but others have regressions. I will investigate the regressions after #4507 to try to understand what's going on. We're force-aligning buffers but that shouldn't be worse than the deep copies that were happening before.

tpcds_queries_compare tpcds_allqueries
Query main (s) #4507 (s) #4572 (s)
TPCDS-01 4.303 4.239 4.341
TPCDS-02 5.111 5.413 5.240
TPCDS-03 1.571 1.538 1.847
TPCDS-04 22.027 21.094 20.312
TPCDS-05 5.552 5.723 5.506
TPCDS-06 1.838 1.693 1.695
TPCDS-07 2.503 2.379 2.452
TPCDS-08 1.601 1.454 1.565
TPCDS-09 9.673 9.771 9.246
TPCDS-10 2.920 2.973 2.839
TPCDS-11 11.558 11.298 10.821
TPCDS-12 1.101 1.083 1.220
TPCDS-13 2.528 2.521 2.677
TPCDS-14a 31.533 28.518 28.985
TPCDS-14b 31.786 27.090 28.025
TPCDS-15 1.639 1.815 1.616
TPCDS-16 6.272 6.292 6.755
TPCDS-17 3.577 3.671 3.214
TPCDS-18 3.092 3.716 3.606
TPCDS-19 1.648 1.585 1.541
TPCDS-20 1.087 1.109 1.293
TPCDS-21 0.779 0.721 0.858
TPCDS-22 2.982 2.902 3.017
TPCDS-23a 35.984 34.227 34.237
TPCDS-23b 46.507 45.944 44.658
TPCDS-24a 22.057 22.422 23.147
TPCDS-24b 23.459 20.861 22.464
TPCDS-25 2.376 2.361 2.266
TPCDS-26 1.312 1.491 3.595
TPCDS-27 2.193 2.259 4.269
TPCDS-28 12.102 11.772 13.403
TPCDS-29 6.224 6.129 6.098
TPCDS-30 2.216 2.431 2.406
TPCDS-31 4.839 4.737 4.705
TPCDS-32 1.094 0.890 1.020
TPCDS-33 1.404 1.351 1.457
TPCDS-34 2.167 1.932 2.167
TPCDS-35 4.462 4.379 4.315
TPCDS-36 2.782 2.720 2.969
TPCDS-37 2.912 2.412 3.386
TPCDS-38 6.699 6.255 6.352
TPCDS-39a 5.590 2.514 2.841
TPCDS-39b 4.184 2.883 2.838
TPCDS-40 3.036 2.812 3.119
TPCDS-41 0.459 0.421 0.509
TPCDS-42 0.877 0.898 0.782
TPCDS-43 2.322 2.032 2.083
TPCDS-44 7.295 6.833 7.581
TPCDS-45 1.722 1.749 1.537
TPCDS-46 2.902 2.633 2.702
TPCDS-47 4.631 4.533 4.753
TPCDS-48 2.403 2.208 2.770
TPCDS-49 3.760 3.921 3.661
TPCDS-50 16.671 16.843 15.976
TPCDS-51 6.923 6.748 7.330
TPCDS-52 0.886 0.766 1.054
TPCDS-53 2.473 2.014 3.281
TPCDS-54 5.476 5.625 5.763
TPCDS-55 0.964 0.835 1.314
TPCDS-56 1.543 1.148 1.674
TPCDS-57 3.646 3.509 3.940
TPCDS-58 1.456 1.267 1.363
TPCDS-59 7.807 7.210 7.825
TPCDS-60 1.775 1.516 1.696
TPCDS-61 1.972 1.735 1.896
TPCDS-62 3.638 3.172 4.188
TPCDS-63 2.009 1.786 2.314
TPCDS-64 11.576 11.327 11.500
TPCDS-65 5.588 5.338 5.235
TPCDS-66 2.639 2.474 2.695
TPCDS-67 32.790 30.212 31.373
TPCDS-68 5.592 1.607 1.402
TPCDS-69 2.273 2.247 2.151
TPCDS-70 6.047 4.519 5.405
TPCDS-71 2.236 1.356 1.640
TPCDS-72 9.072 8.243 8.033
TPCDS-73 1.262 1.112 1.070
TPCDS-74 8.803 8.449 8.124
TPCDS-75 15.381 14.322 13.975
TPCDS-76 6.932 6.637 6.885
TPCDS-77 1.489 1.469 1.497
TPCDS-78 15.289 14.661 14.086
TPCDS-79 3.082 2.071 2.000
TPCDS-80 5.332 5.350 5.082
TPCDS-81 3.647 3.526 3.277
TPCDS-82 4.227 3.861 4.941
TPCDS-83 0.983 0.962 1.293
TPCDS-84 2.682 2.223 2.503
TPCDS-85 4.294 3.429 3.456
TPCDS-86 1.504 1.396 1.355
TPCDS-87 6.849 6.398 6.257
TPCDS-88 21.062 20.543 20.878
TPCDS-89 2.041 2.069 2.268
TPCDS-90 3.313 3.734 3.390
TPCDS-91 0.782 0.846 0.940
TPCDS-92 1.077 0.975 1.011
TPCDS-93 20.542 19.973 18.996
TPCDS-94 5.443 5.285 5.355
TPCDS-95 19.409 18.508 18.682
TPCDS-96 4.313 3.873 4.326
TPCDS-97 4.889 4.643 4.861
TPCDS-98 1.496 1.433 1.506
TPCDS-99 4.055 3.276 3.789
Total 681.881 639.129 657.612

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove mutable buffer use from CometArrowConverters

1 participant