Skip to content

docs: Rewrite operators page to show complete overview of what is and is not supported by Comet#4563

Draft
andygrove wants to merge 3 commits into
apache:mainfrom
andygrove:docs-operator-support-reference
Draft

docs: Rewrite operators page to show complete overview of what is and is not supported by Comet#4563
andygrove wants to merge 3 commits into
apache:mainfrom
andygrove:docs-operator-support-reference

Conversation

@andygrove
Copy link
Copy Markdown
Member

@andygrove andygrove commented Jun 2, 2026

Which issue does this PR close?

N/A

See rendered versions:

Rationale for this change

The user guide has a Supported Spark Operators page, but it only lists the operators Comet replaces and does not tell users what is not supported or what is planned. This mirrors what #4550 did for expressions: turn the operators page into a complete, status-aware reference so users can see at a glance whether a given Spark physical operator is supported, supported with caveats, planned, or not currently planned.

What changes are included in this PR?

Rewrites docs/source/user-guide/latest/operators.md into a complete reference:

  • A four-bucket status legend (✅ Supported, ⚠️ Supported with caveats, 🔜 Planned, 💤 Not currently planned), matching the expression reference.
  • A Not currently planned section for operator families that fall back by design (Structured Streaming operators, Cartesian / cross joins, sampling and range generation).
  • Per-category tables (Scans, Projection and filtering, Sorting and limiting, Aggregation, Joins, Exchanges, Window, Generators and set operations, Writes, Python and UDF).

Support status is derived from the createExecEnabledConfig defaults in CometConf (for example window and takeOrderedAndProject are enabled by default, localTableScan is disabled by default) plus the operator handling in CometExecRule. Notably this corrects a stale note: WindowExec is enabled by default now, not disabled.

This is a draft because the status of several not-yet-supported operators is an initial proposal and would benefit from maintainer confirmation, in particular:

  • InMemoryTableScanExec (marked Planned, could be Not currently planned)
  • SortAggregateExec (Planned vs Not currently planned)
  • WindowGroupLimitExec and the Python / UDF operators (Planned references)

Open issues / PRs referenced for Planned items: #4429 (nested loop join), #4234 (PyArrow UDFs), #4393 (LocalTableScan default), #2721 (window functions).

How are these changes tested?

Documentation only. Rendered locally and checked with prettier.

@andygrove andygrove marked this pull request as ready for review June 2, 2026 13:20
- Rewrite datatypes.md to match the status-aware reference style
  introduced for operators and expressions, listing every Spark data
  type by status (✅ / ⚠️ / 🔜 / 💤) with caveats and tracking issues.
- In operators.md, link each scan row to the relevant compatibility
  doc (Parquet Scan Compatibility, Iceberg Guide).
- Note that LocalTableScanExec is disabled by default because there
  is no acceleration advantage and it is typically only used in
  test code.

[skip ci]
| ✅ Supported | Native support; enabled by default. |
| ⚠️ Supported (caveats) | Works, but with limits: certain values, contexts, or configurations fall back to Spark. |
| 🔜 Planned | Intended; tracked by an open issue or pull request. |
| 💤 Not currently planned | Not on the current roadmap; queries referencing this type fall back to Spark and may be reconsidered. |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps TBD instead of zzz ? Also, planned should probably be a different icon ? It is not very clear in dark mode

@andygrove andygrove marked this pull request as draft June 2, 2026 21:08
Copy link
Copy Markdown
Contributor

@coderfender coderfender left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments

…-scan caveats

Parquet-scan-specific caveats (ShortType INT16/UINT_8 disambiguation,
Decimal binary encoding, datetime rebasing, TimestampNTZ read conversion)
are documented in the Parquet scan compatibility guide, so they no longer
downgrade the type status here. Supported types now show a green checkmark.

[skip ci]
@andygrove andygrove force-pushed the docs-operator-support-reference branch from a4b3024 to 024b4de Compare June 2, 2026 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants