[regression](load) add some p0 case for load strict mode#62478
[regression](load) add some p0 case for load strict mode#62478jacktengg merged 3 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
|
review |
|
/review |
There was a problem hiding this comment.
Finding:
regression-test/suites/load_p0/insert/test_insert_strict_mode_and_filter_ratio.groovy: the new strict-modeinsert into selectcase forNULL -> NOT NULLis asserting the wrong failure contract. In FE, successfulinsert into selectjobs callAbstractInsertExecutor.checkStrictModeAndFilterRatio()after execution, and that path raisesInsert has filtered data in strict modewheneverenable_insert_strict=trueandfilteredRows > 0(fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/insert/AbstractInsertExecutor.java:220-225). The paired non-strict case in this same test (insert_max_filter_ratio=0.2) already proves this scenario is counted as filtered rows rather than failing earlier with an executor error, so this strict assertion would lock in behavior that contradicts the current implementation.
Critical checkpoint conclusions:
- Goal of the task: add regression coverage for strict mode and filter-ratio behavior in stream load and
insert into select. The PR mostly does that, but the strictNULL -> NOT NULLinsert into selectassertion does not match the current code path, so the goal is not fully met yet. - Modification size and focus: yes, the PR is focused on regression coverage only.
- Concurrency: not applicable; these changes are test-only and do not introduce concurrent code.
- Lifecycle, static init, config, compatibility: not applicable; no new lifecycle, config, protocol, or storage-format changes.
- Parallel code paths: checked against existing
VALUES, TVF, and FE insert-executor behavior. Oneinsert into selectbranch is inconsistent with the FE path that governs filtered-row handling. - Special conditions: the strict-mode and error-message distinction depends on whether the failure happens during execution or during post-exec filtered-row validation. This case is in the latter category.
- Test coverage: coverage is expanded well overall, but this blocking assertion should be corrected before relying on the suite.
- Test result files: the new
.outupdates are consistent with the new cases and appear auto-generated. - Observability, transaction, persistence, data write safety, performance: not applicable for this regression-only PR.
- Other issues: none beyond the blocking assertion above.
| sql """ | ||
| insert into test_insert_select_strict_mode_and_filter_ratio_dst select * from test_insert_select_strict_mode_and_filter_ratio_src; | ||
| """ | ||
| exception """Insert has too many filtered data""" |
There was a problem hiding this comment.
This assertion looks inconsistent with the current insert into select implementation. For this branch, the non-strict sibling just above already shows the job completes and reports filtered rows, so the strict-mode failure should come from AbstractInsertExecutor.checkStrictModeAndFilterRatio() in FE. That code path raises Insert has filtered data in strict mode, not Insert has too many filtered data, and it does not append a url check either.
| exception """Insert has too many filtered data""" | |
| exception """Insert has filtered data in strict mode""" |
|
/review |
There was a problem hiding this comment.
No blocking findings from code review.
Critical checkpoint conclusions:
- Goal / correctness: The PR expands regression coverage for load strict-mode and filter-ratio behavior, especially
INSERT INTO SELECTcases ininsert, tinyint overflow instream_load, and expected error-message alignment intvf. The added/updated suites and.outfiles are consistent with that goal. I did not see a contradictory expectation in the changed files. - Scope / focus: The modification is small and focused to regression assets only (
regression-test/suites,.out, and one CSV input). No unrelated production-code churn is mixed in. - Concurrency: Not applicable; these are regression scripts and golden outputs only.
- Lifecycle / static init: Not applicable.
- Configuration: No new config items are added. The changes only exercise existing session variables and load properties.
- Compatibility: No incompatible API, storage-format, or protocol changes are introduced.
- Parallel code paths: The PR touches the relevant load-path regression suites in scope (
insert,stream_load,tvf). Within this scope I did not find an obvious missing companion assertion that would make coverage inconsistent. - Special conditions: The tests intentionally cover strict-mode and filter-ratio thresholds with explicit expected failures/successes; the conditions are understandable from the case names and comments.
- Test coverage: Coverage is materially improved for strict-mode/filter-ratio combinations. For this PR type, regression tests are the right level. I could not verify execution in this runner.
- Modified test results: The updated
.outfiles match the new query tags added in the suites. I did not find a mismatched or missing golden-result section in the modified files. - Observability: No observability changes are needed for this test-only PR.
- Transaction / persistence / data-write correctness: No production logic changed, so there is no direct risk in FE/BE transaction or persistence behavior from this patch itself.
- Performance: No production hot path changed.
- Other issues: None identified during review.
Residual risk: since I did not execute the regression suite here, the remaining risk is limited to environment-dependent result drift rather than a clear logic problem in the patch itself.
|
run buildall |
|
run buildall |
|
PR approved by at least one committer and no changes requested. |
|
/review |
There was a problem hiding this comment.
No blocking findings from code review.
Critical checkpoint conclusions:
- Goal / correctness: The PR goal is to expand strict-mode and filter-ratio regression coverage for load-related paths, and to make the exception assertions match the cloud-mode truncated error contract. The changed suites and golden files are consistent with that goal, including the new
INSERT INTO SELECTcases, the tinyint-overflowstream_loadcases, and the updatedtvf/insertexception checks. - Scope / focus: The modification is small and focused to regression assets only (
regression-test/suites,.out, and one CSV input file). No unrelated production-code changes are mixed in. - Concurrency: Not applicable; these changes are test-only and do not introduce concurrent code.
- Lifecycle / static initialization: Not applicable.
- Configuration: No new config items are added. The tests only exercise existing session variables and load properties.
- Compatibility: No incompatible API, protocol, symbol, or storage-format change is introduced.
- Parallel code paths: The PR updates the relevant sibling suites in scope (
insert,stream_load,tvf) and the expectations are consistent with the paths being exercised. - Special conditions: The cloud-mode error truncation is a non-intuitive case, and the added comments explain why the assertions must match
first_error_msgfragments instead of relying on the older generic load-failure text. - Test coverage: Coverage is materially improved for strict-mode /
max_filter_ratiocombinations, especially forINSERT INTO SELECTand tinyint overflow. I did not execute the suites locally in this runner, but the PR checks forP0 Regression,cloud_p0, andNonConcurrent Regressionare passing on the head SHA. - Modified test results: The updated
.outfiles match the new query tags and expected result sets in the changed suites. - Observability / transaction / persistence / data-write correctness / FE-BE variable passing / performance: Not applicable for this regression-only PR.
- Other issues: None identified during review.
Residual risk: limited to environment-dependent regression drift rather than an identified logic problem in this patch.
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)