feat: add informational message channel distinct from fallback reasons#4509
Draft
andygrove wants to merge 4 commits into
Draft
feat: add informational message channel distinct from fallback reasons#4509andygrove wants to merge 4 commits into
andygrove wants to merge 4 commits into
Conversation
Rename withInfo/withInfos/hasExplainInfo and EXTENSION_INFO to withFallbackReason/withFallbackReasons/hasFallbackReason and FALLBACK_REASONS to match their actual semantics (fallback reasons, not generic info). Also rename the private extensionInfo helper in ExtendedExplainInfo to fallbackReasons, and update the TreeNodeTag string from "CometExtensionInfo" to "CometFallbackReasons" so a future PR can reuse the old string for a distinct tag.
…skip ci] When date_format gets a natively-supported format string but the session timezone is non-UTC and allowIncompatible is off, Comet takes the JVM codegen path. Emit a COMET-INFO hint on the expression and lift expression-level info messages onto the converted operator centrally in CometExecRule, so verbose extended explain shows the faster native option and how to enable it.
# Conflicts: # spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala # spark/src/main/scala/org/apache/comet/ExtendedExplainInfo.scala # spark/src/main/scala/org/apache/comet/serde/contraintExpressions.scala # spark/src/main/scala/org/apache/comet/serde/datetime.scala # spark/src/main/scala/org/apache/comet/serde/math.scala # spark/src/main/scala/org/apache/comet/serde/statics.scala # spark/src/main/scala/org/apache/comet/serde/strings.scala # spark/src/main/scala/org/apache/comet/serde/structs.scala # spark/src/main/scala/org/apache/comet/serde/unixtime.scala
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #4006.
Depends on and is stacked on #4508 (the
withInfo->withFallbackReasonrename). Because the two branches live on a fork, this PR targetsmainand therefore currently includes #4508's rename commit in its diff. Please review #4508 first; once it merges, rebase will reduce this PR to just the feature commits below.Rationale for this change
Comet only had one way to tag a plan node with a message, and that message always meant "this node falls back to Spark". There was no way to attach a purely informational note that does not trigger fallback. This is increasingly useful with codegen dispatch: when Comet runs a JVM implementation of an expression even though a faster native implementation exists behind a config, we want to tell the user about the faster path without that note being treated as a fallback.
What changes are included in this PR?
CometSparkSessionExtensions.withInfo(node, message)records a message on a newCometExplainInfo.EXTENSION_INFOtag. It does not cause fallback: no planning rule reads this tag.[COMET-INFO: ...]segment, in addition to any[COMET: ...]fallback segment on the same node. The fallback explain list format is unchanged and still excludes info messages.CometExecRule.convertToComet(a single central rollup, applied to all native operators), because verbose explain only traverses plan nodes, not expressions.CometDateFormatemits a[COMET-INFO: ...]hint when a natively-supported format is requested but native execution is gated off (non-UTC session timezone withallowIncompatibledisabled), so Comet runs the JVM codegen path. The hint names the exact config key to enable the faster native path.Known limitation for future work: the Spark 4.x
CometExprShimnode reconstruction copiesFALLBACK_REASONSbut notEXTENSION_INFOonto the wrappingInvoke. No current code path routeswithInfothrough those shims, so this is latent. It can be addressed if a future serde tags one of those reconstructed nodes.How are these changes tested?
New tests in
CometExpressionSuite:withInfodoes not set a fallback reason and renders as[COMET-INFO: ...]in verbose explain, and a second message accumulates rather than overwriting.date_formattakes the JVM codegen path under a non-UTC timezone and surfaces the[COMET-INFO: ...]hint naming theDateFormatClass.allowIncompatibleconfig key.The full
CometExpressionSuitepasses (125 succeeded), confirming the centralconvertToCometrollup does not regress operator conversion.scalastyle:checkpasses.