[fix](variant) Avoid mutating shared variant columns#64092
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
9566c5e to
ff10f78
Compare
|
run buildall |
ff10f78 to
2269005
Compare
|
run buildall |
2269005 to
00d2917
Compare
|
run buildall |
|
run external |
TPC-H: Total hot run time: 29484 ms |
TPC-DS: Total hot run time: 169084 ms |
|
/review |
TPC-H: Total hot run time: 29117 ms |
|
run p0 |
|
run cloud_p0 |
|
run nonConcurrent |
TPC-DS: Total hot run time: 169178 ms |
TPC-H: Total hot run time: 28323 ms |
TPC-DS: Total hot run time: 169386 ms |
There was a problem hiding this comment.
Code review completed for PR #64092.
No blocking issues found. The changes are focused on avoiding in-place mutation of shared BE column trees after local exchange, and the implementation consistently uses private finalized/deep-cloned copies for Variant serialization/casts and a detached probe block for lazy nested-loop join materialization.
Critical checkpoint conclusions:
- Goal and tests: The code addresses the stated shared-column mutation problem. Added/updated unit coverage exercises nullable CRC hashing, Variant cast/serialization non-mutation, finalized Variant deep copy, and the relevant lazy NLJ nullable append behavior described in the PR.
- Scope/focus: The change is reasonably small for the affected paths and avoids broad behavioral changes or storage/protocol format changes.
- Concurrency/COW: No new thread or dependency behavior is introduced. The reviewed paths avoid mutating columns that may be shared across downstream pipeline tasks; recursive/deep cloning and lazy probe-block detachment look appropriate.
- Lifecycle/static initialization: No new static initialization or unusual lifecycle hazard found.
- Configuration/compatibility: No new config items and no incompatible serialization layout changes found.
- Parallel paths: Block-level Variant serialization pre-finalizes unfinalized top-level Variant columns, while datatype-level Variant serialization still protects nested/fallback callers. Cast and lazy NLJ paths are covered separately.
- Error handling: New Status-returning paths propagate errors with RETURN_IF_ERROR/RETURN_IF_CATCH_EXCEPTION. No ignored Status found in the diff.
- Memory/performance: The fixes intentionally trade extra cloning for correctness on shared columns. Block serialization avoids cloning the same top-level Variant twice; nested Variant datatype callers may still clone independently for sizing/serialization, but I did not find a correctness issue.
- Observability/transactions/persistence: Not applicable; no transaction, persistence, or user-facing observability changes.
- User focus points: No additional user-provided review focus was present.
Review note: I inspected the diff and related call paths and ran git diff --check for the PR diff; I did not rerun the full BE unit/build suite in this review runner.
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run external |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
1 similar comment
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
1 similar comment
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
00d2917 to
9eb7865
Compare
|
Latest local verification after updating head to
|
b5fde9f to
761a984
Compare
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Queries that evaluate VARIANT expressions after local exchange can share input blocks across downstream pipeline tasks. Variant casts and Variant serialization finalized source columns in-place, so one consumer could mutate a shared input column while another consumer still expected the original column shape and row count. This made local-shuffle query results unstable and could trigger later operators to observe changed Variant column contents or sizes. This change confines the fix to Variant handling by using private finalized Variant copies for cast and serialization paths instead of mutating the source column.
### Release note
None
### Check List (For Author)
- Test:
- PATH=/tmp/doris-clang-format-bin:$PATH build-support/clang-format.sh
- git diff --check HEAD^
- ./build.sh --be
- ./run-be-ut.sh --run --filter='ColumnVariantTest.serialize_does_not_finalize_source_column:ColumnVariantTest.block_serialize_does_not_finalize_source_column:FunctionVariantCast.CastFromVariantDoesNotFinalizeSourceColumn:FunctionVariantCast.CastFromVariant'
- Manual test: with column_nullable.cpp and column_nullable_test.cpp reverted from this PR, local 1FE+4BE first local-shuffle repro passed 16x100 concurrent executions
- Manual test: with column_nullable.cpp and column_nullable_test.cpp reverted from this PR, local 1FE+4BE second local-shuffle repro matched local-off baseline for 100 iterations
- Behavior changed: No
- Does this need documentation: No
761a984 to
5078f25
Compare
|
run buildall |
|
run buildall |
TPC-H: Total hot run time: 29109 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-DS: Total hot run time: 169655 ms |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
/review |
There was a problem hiding this comment.
Review summary: I found no blocking or inline-worthy issues in this PR. The change is focused on avoiding in-place finalization of shared Variant columns during serialization and cast paths, and the implementation uses private finalized clones without changing the serialized format or cast semantics.
Critical checkpoint conclusions:
- Goal/test proof: The PR addresses shared-block mutation in Variant serialization/casts. Added BE unit tests cover serialization, block serialization, clone detachment, and Variant-to-int/string casts not finalizing the source column.
- Scope: The modification is small and limited to Variant serialization/cast handling plus targeted tests.
- Concurrency: No new threads or locks are introduced. The relevant concurrency issue is shared input block reuse after local exchange, and the clone-before-finalize approach avoids mutating shared columns.
- Lifecycle/static initialization: No new static/global lifecycle concerns found.
- Configuration/compatibility: No config items, protocol changes, or storage/serialization format changes found.
- Parallel paths: Both DataTypeVariant block serialization and Variant cast paths are updated. String/JSONB delegated casts run on a copied block with finalized input; root-value casts use the finalized clone root while preserving original null propagation.
- Conditional checks: The new finalization checks are tied to the concrete non-finalized Variant state and match the stated failure mode.
- Tests/results: Added unit tests are relevant. I did not run the full BE test suite in this review runner.
- Observability: No additional observability appears necessary for this internal correctness fix.
- Transactions/persistence/data writes: Not applicable.
- FE/BE variable passing: Not applicable.
- Performance: The change can clone/finalize non-finalized Variant columns where previous code mutated in place, but this is the intended safety tradeoff for shared blocks and is limited to non-finalized inputs.
User focus: No additional user-provided review focus was specified.
What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Queries that evaluate VARIANT expressions after local exchange can share input blocks across downstream pipeline tasks. Variant casts and Variant serialization finalized source columns in-place, so one consumer could mutate a shared input column while another consumer still expected the original column shape and row count. This made local-shuffle query results unstable and could trigger later operators to observe changed Variant column contents or sizes. This change confines the fix to Variant handling by using private finalized Variant copies for cast and serialization paths instead of mutating the source column.
Release note
None
Check List (For Author)