Skip to content

orca: keep parallel semi joins from rewriting to dedup#1764

Open
aviralgarg05 wants to merge 1 commit into
apache:mainfrom
aviralgarg05:fix-1593-orca-parallel-plan-instability
Open

orca: keep parallel semi joins from rewriting to dedup#1764
aviralgarg05 wants to merge 1 commit into
apache:mainfrom
aviralgarg05:fix-1593-orca-parallel-plan-instability

Conversation

@aviralgarg05
Copy link
Copy Markdown
Contributor

Fixes #1593

What does this PR do?

This change keeps ORCA from rewriting semi-joins into inner-join-plus-deduplicate plans when parallel planning is enabled.

The issue behind orca_parallel was plan instability: in parallel mode, ORCA could choose rewrite paths that produced a dedup-based fallback shape instead of a direct semi-join. That made the expected plan unstable and, more importantly, pushed ORCA toward a plan form that is not desirable for parallel semi-join execution.

The fix is intentionally small and focused:

  • In CXformLeftSemiJoin2InnerJoin
  • In CXformLeftSemiJoin2InnerJoinUnderGb

both rewrites now decline to fire when gpdb::IsParallelModeOK() is true.

That preserves the existing rewrite behavior for non-parallel planning, while keeping parallel semi-joins on the direct semi-join path.

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

None.

Test Plan

What I verified:

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Manual verification performed:

  • Confirmed the touched ORCA xform objects compile cleanly
  • Rebuilt the modified xform objects twice to make sure the change is deterministic

Notes:

  • I attempted a fuller local build and regression run on macOS, but the tree currently hits unrelated baseline build failures with the local Apple toolchain before reaching the full regression path. The failures were outside this patch, so I did not modify unrelated files just to get the environment through.

Impact

Performance:

No broad performance change is intended.

This only affects ORCA’s choice space in parallel semi-join planning by preventing two rewrite paths that can lead to unstable dedup-based plans. In practice, this should make plan selection more predictable for the affected parallel queries.

User-facing changes:

Users should see more stable ORCA plans for affected parallel semi-join queries.

There is no SQL behavior change and no syntax change.

Dependencies:

None.

Checklist

Additional Context

This patch is intentionally narrow.

I avoided broader planner changes and did not touch regression files without a verified regenerated answer file from a clean local run. The code change is limited to the two ORCA xforms responsible for the semi-join-to-dedup rewrite path.

CI Skip Instructions

No CI skip requested.

*/
if (gpdb::IsParallelModeOK())
{
return ExfpNone;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disabling them removes a plan candidate that may have been the cost-optimal choice even for some serial queries in a parallel-enabled session, we need to find root cause .

@yjhjstz yjhjstz self-requested a review May 25, 2026 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Parallel query plan changes, plan is unstable

2 participants