perf: avoid FFI import/export when passing batches between native plans by andygrove · Pull Request #3930 · apache/datafusion-comet

andygrove · 2026-04-12T15:40:02Z

Which issue does this PR close?

Closes #3925.

Rationale for this change

When Comet has a native ShuffleWriter and a native child plan, batches get created in native code, then exported to JVM via Arrow FFI, then imported back to native for the shuffle writer. The JVM never reads the data, so the FFI round-trip is unnecessary overhead.

This PR provides faster performance and reduces JVM memory allocations. Impact may be larger for more complex schemas / larger benchmarks, but I see a clear win with a local TPC-H @ SF100 comparison with the main branch.

What changes are included in this PR?

Introduces a "batch stash" optimization that passes batches as opaque u64 handles through the JVM instead of doing full Arrow FFI export/import. This reduces the per-batch overhead from 4 FFI boundary crossings to 2 lightweight JNI calls passing a single long.

New components:

BatchStash (native/core/src/execution/batch_stash.rs) -- global Mutex<HashMap<u64, RecordBatch>> registry. The child plan stashes its output batch, and the shuffle writer's ScanExec retrieves it by handle.
CometHandleBatchIterator (Java + Rust JNI bridge) -- passes handles between the two native execution contexts through the JVM.
executePlanBatchHandle JNI function -- like executePlan but stashes the output batch instead of FFI-exporting it. Shared execution logic is extracted into execute_plan_impl with an OutputMode enum to avoid duplication.
CometShuffleWriterInputIterator -- preserves the CometExecIterator reference through Spark's shuffle dependency RDD so the shuffle writer can detect native input.

Modified components:

CometExecIterator -- gains stash mode (enableStashMode(), nextHandle()) for producing handles instead of ColumnarBatch.
ScanExec -- gains handle_mode flag. When true, retrieves batches from the BatchStash instead of via Arrow FFI import.
Scan protobuf message -- new bool batch_stash_handle field signals handle mode to the native planner.
CometNativeShuffleWriter -- detects native child plans automatically and enables the stash path.

Detection is automatic -- no configuration needed. When a CometExecIterator feeds into a native shuffle writer, the optimization activates. Non-native child plans fall back to the existing FFI path.

How are these changes tested?

Existing CometNativeShuffleSuite (22 tests) passes -- validates correctness of the stash path since all native shuffle queries now use it.
Existing CometShuffleSuite passes -- validates the fallback FFI path for columnar shuffle.
New Rust unit tests for BatchStash (stash/take semantics, handle uniqueness, cleanup).
Clippy clean, formatted.

Design for avoiding unnecessary Arrow FFI import/export when passing batches between two native plans (issue apache#3925). Uses a native-side batch registry to pass opaque handles through the JVM instead of full Arrow FFI round-trips.

… RDD

…_instance_of

- Extract shared execution logic into execute_plan_impl with OutputMode enum - Replace stringly-typed handle mode detection with HANDLE_SCAN_SOURCE constant - Remove no-op catch/throw in CometExecIterator.nextHandle() - Remove unnecessary #![allow(dead_code)] from batch_stash module - Remove unnecessary @volatile from stashMode field

…can source name

andygrove · 2026-04-12T15:43:56Z

native/core/src/execution/operators/scan.rs

    arrow_ffi_safe: bool,
+    /// When true, input comes from a CometHandleBatchIterator and batches are
+    /// retrieved from the BatchStash instead of via Arrow FFI import.
+    pub handle_mode: bool,


We should probably combine arrow_ffi_safe and handle_mode and use an enum, but I would prefer to do that as a follow on PR. Something like:

enum TransferMode { FfiSafe, FfiUnsafe, Handle

…conciliation Stashed RecordBatches are already fully formed by the child plan. Decomposing them into columns and rebuilding via build_record_batch caused assertion failures when the ScanExec schema (from protobuf data_types) didn't exactly match the batch schema.

… true)

…ches Stashed batches may have type differences from the ScanExec schema (e.g., Timestamp without timezone vs with timezone). When column counts match, delegate to build_record_batch for casting. When they don't match, return the batch as-is.

andygrove · 2026-04-12T19:22:23Z

common/src/main/scala/org/apache/comet/CometConf.scala

      .intConf
      .createWithDefault(1)

+  val COMET_SHUFFLE_BATCH_STASH_ENABLED: ConfigEntry[Boolean] =


Added config for now just in case we discover bugs, but I plan on removing this config in the future.

viirya · 2026-04-13T02:49:40Z

native/core/src/execution/batch_stash.rs

+static NEXT_HANDLE: AtomicU64 = AtomicU64::new(1);
+
+/// Global stash mapping handles to RecordBatch values.
+static STASH: Lazy<Mutex<HashMap<u64, RecordBatch>>> = Lazy::new(|| Mutex::new(HashMap::new()));


Does this must be a global one? Any leak risk? Do we need some cleanup to remove the content?

viirya · 2026-04-13T02:50:34Z

native/core/src/execution/batch_stash.rs

+    let handle = NEXT_HANDLE.fetch_add(1, Ordering::Relaxed);
+    STASH
+        .lock()
+        .expect("batch_stash lock poisoned")


Probably return an error instead of panic? i.e., lock().unwrap_or_else(|e| e.into_inner()).

viirya · 2026-04-13T02:54:13Z

native/core/src/execution/operators/scan.rs

+                if columns.len() == self.schema.fields().len() {
+                    // Column counts match. Use build_record_batch to handle any
+                    // type differences (e.g., timestamp timezone casting).
+                    let maybe_batch = self.build_record_batch(columns, num_rows);


Why the batch is complete, why need to build_record_batch?

/// A complete RecordBatch retrieved from the BatchStash. Bypasses /// `build_record_batch` since the batch is already fully formed. Complete(RecordBatch),

And in shuffle_scan.rs, Complete batch is returned directly. It looks inconsistent.

viirya · 2026-04-13T02:55:10Z

native/core/src/execution/operators/scan.rs

+                } else {
+                    // Column count mismatch (e.g., empty schema scan).
+                    // Return the stashed batch as-is since it's already valid.
+                    Poll::Ready(Some(Ok(batch.clone())))


It might be more clear in semantics here to take it instead of clone?

viirya

A performance improvement PR with high quality. Only a few comments.

Thanks @andygrove

andygrove added 16 commits April 12, 2026 08:01

docs: add implementation plan for batch stash shuffle optimization

488522c

feat: add BatchStash registry for native batch handle passing

ca8ee6a

feat: add CometHandleBatchIterator Java class and JNI bridge

e68d777

feat: add executePlanBatchHandle JNI function for stash-mode output

07c86c7

feat: add stash mode to CometExecIterator for batch handle output

d2f87ab

feat: add handle-mode input path to ScanExec for batch stash retrieval

0a49ca6

feat: detect CometHandleBatchIterator in planner and enable handle mode

84862ef

feat: preserve CometExecIterator reference through shuffle dependency…

3588543

… RDD

feat: integrate batch stash mode in CometNativeShuffleWriter

db137ad

style: apply formatting fixes

ed95e83

fix: use scan source name for handle mode detection instead of JNI is…

a827437

…_instance_of

chore: remove JVM crash log files

3af0e9a

refactor: use protobuf field for batch stash handle mode instead of s…

6cc0e78

…can source name

chore: remove design docs from PR

c42be90

andygrove commented Apr 12, 2026

View reviewed changes

fix: add batch_stash_handle field to Scan struct literals in tests

f70119a

andygrove changed the title ~~feat: avoid FFI import/export when passing batches between native plans~~ perf: avoid FFI import/export when passing batches between native plans Apr 12, 2026

andygrove added 3 commits April 12, 2026 11:16

feat: add spark.comet.exec.shuffle.batchStash.enabled config (default…

d37752a

… true)

andygrove marked this pull request as ready for review April 12, 2026 19:19

andygrove requested a review from wForget April 12, 2026 19:21

andygrove commented Apr 12, 2026

View reviewed changes

andygrove requested review from comphead, mbutrovich, parthchandra and viirya April 12, 2026 19:30

viirya reviewed Apr 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: avoid FFI import/export when passing batches between native plans#3930

perf: avoid FFI import/export when passing batches between native plans#3930
andygrove wants to merge 20 commits intoapache:mainfrom
andygrove:batch-stash-shuffle-optimization

andygrove commented Apr 12, 2026 •

edited

Loading

Uh oh!

andygrove Apr 12, 2026

Uh oh!

andygrove Apr 12, 2026

Uh oh!

viirya Apr 13, 2026

Uh oh!

viirya Apr 13, 2026

Uh oh!

viirya Apr 13, 2026

Uh oh!

viirya Apr 13, 2026

Uh oh!

viirya Apr 13, 2026

Uh oh!

viirya left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

andygrove commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

andygrove Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

andygrove Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

viirya Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

viirya Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

viirya Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

viirya Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

viirya Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

viirya left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

andygrove commented Apr 12, 2026 •

edited

Loading

viirya left a comment •

edited

Loading