Skip to content

Refactor hash join build-report lifecycle into BuildReportHandle#22623

Open
kosiew wants to merge 4 commits into
apache:mainfrom
kosiew:lifecycle-management-22622
Open

Refactor hash join build-report lifecycle into BuildReportHandle#22623
kosiew wants to merge 4 commits into
apache:mainfrom
kosiew:lifecycle-management-22622

Conversation

@kosiew
Copy link
Copy Markdown
Contributor

@kosiew kosiew commented May 29, 2026

Which issue does this PR close?

Rationale for this change

The build-report lifecycle for hash-join partitions was previously spread across HashJoinStream, OnceFut polling, and drop-time cancellation logic. Although correctness around scheduled versus delivered reports had already been addressed, the lifecycle ownership remained fragmented and difficult to reason about.

This change centralizes lifecycle management into a dedicated abstraction, making state transitions explicit and ensuring drop-time behavior is self-contained and easier to maintain.

What changes are included in this PR?

  • Introduce a new BuildReportHandle type that owns the lifecycle of a partition's build-data report.

  • Consolidate report lifecycle state management into explicit states:

    • NotReported
    • Scheduled
    • Delivered
    • Canceled
    • Finalized
  • Move report scheduling, delivery tracking, cancellation, and finalization logic out of HashJoinStream and into BuildReportHandle.

  • Implement drop-safe behavior in BuildReportHandle so pending scheduled reports are canceled when dropped before delivery.

  • Replace the stream's separate accumulator, waiter, and lifecycle state fields with a single build_report handle.

  • Add test-only helpers in shared_bounds.rs to construct partitioned accumulators and inspect completed partition counts.

  • Update lifecycle documentation to reflect the new centralized ownership model and state transitions.

Are these changes tested?

Yes.

New tests were added covering lifecycle behavior and invariants:

  • report_canceled_partition_is_noop_after_report
  • report_canceled_partition_marks_pending_partition_canceled
  • build_report_handle_cancels_scheduled_partition_on_drop
  • build_report_handle_does_not_cancel_delivered_partition_on_drop
  • build_report_handle_cancel_pending_is_idempotent
  • build_report_handle_no_accumulator_finalizes

These tests verify correct cancellation behavior, delivery tracking, terminal-state handling, and idempotency.

Are there any user-facing changes?

No. This is an internal refactoring of hash-join build-report lifecycle management and does not change user-facing behavior.

LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.

kosiew added 3 commits May 29, 2026 20:48
- Added BuildReportHandle to streamline operations
- Centralized the schedule, deliver, cancel, and finalize lifecycle processes
- Moved OnceFut waiter functionality into the handle for better management
- Simplified the drop and wait path for HashJoinStream
- Introduced three focused lifecycle tests for improved coverage
- Added test-only helpers in shared_bounds for stream lifecycle tests
- Reused existing accumulator test setup for efficiency
- Updated `wait_for_delivery()` to directly set `Delivered` after a successful `fut.get_shared(cx)`.
- Removed the `mark_delivered()` helper to prevent independent marking of delivered states in tests and internal code.
- Eliminated redundant stream-level drop cleanup entry point; cleanup is now handled by `BuildReportHandle`’s Drop implementation.
- Updated the delivered-path test for improved verification of the waiter flow:
- Changed test to use a one-partition accumulator.
- Polls `wait_for_delivery()` with a noop waker/context.
- Asserts that the completion count is 1 to confirm expected behavior without extra cancellation side effects.
…tream modules

- shared_bounds.rs:
- Removed unused top-level test helper
- Removed forwarding test wrapper
- Kept only cross-module helpers at the top level

- stream.rs:
- Added partitioned_handle test helper
- Removed repeated BuildReportHandle::new(...) boilerplate
- Simplified noop context creation
- Documented defensive no-accumulator fallback in schedule
@github-actions github-actions Bot added the physical-plan Changes to the physical-plan crate label May 29, 2026
- Renamed `wait_for_delivery` to `poll_delivery`
- Renamed `cancel_if_pending` to `cancel_pending`
- Implemented `Debug`, `PartialEq`, and `Eq` for `BuildReportState`
- Added test-only `state()` accessor
- Included state assertions in tests
- Added no-accumulator finalize test
- Updated idempotency test name to align with new method naming
@kosiew kosiew marked this pull request as ready for review May 29, 2026 13:33
@kosiew kosiew requested a review from adriangb May 29, 2026 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Refactor] Introduce a build-report lifecycle handle for hash-join partitions

1 participant