Skip to content

[FIX] Integrate never-worked baseline handling into CI reporting#1058

Draft
Rahul-2k4 wants to merge 5 commits into
CCExtractor:masterfrom
Rahul-2k4:feat/never-worked-baseline-integration
Draft

[FIX] Integrate never-worked baseline handling into CI reporting#1058
Rahul-2k4 wants to merge 5 commits into
CCExtractor:masterfrom
Rahul-2k4:feat/never-worked-baseline-integration

Conversation

@Rahul-2k4

@Rahul-2k4 Rahul-2k4 commented Mar 11, 2026

Copy link
Copy Markdown

Please prefix your pull request with one of the following: [FEATURE] [FIX] [IMPROVEMENT].

In raising this pull request, I confirm the following (please check boxes):

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.

My familiarity with the project is as follows (check one):

  • I have never used the project.
  • I have used the project briefly.
  • I have used the project extensively, but have not contributed previously.
  • I am an active contributor to the project.

Summary

  • integrate never-worked baseline handling into completed test reporting
  • restrict baseline refresh to trusted main-repo commit runs
  • make never-worked reporting platform-specific and tighten migration backfill behavior

Linked Issue

Closes #1057

Test Plan

  • ./.venv/bin/python -m nose2 -v tests.test_ci.test_controllers.TestControllers.test_refresh_baseline_statuses_for_test_uses_full_result_status tests.test_ci.test_controllers.TestControllers.test_refresh_baseline_statuses_for_test_skips_pull_request_runs tests.test_ci.test_controllers.TestControllers.test_comment_info_separates_never_worked_failures tests.test_ci.test_controllers.TestControllers.test_comment_info_tracks_never_worked_per_platform tests.test_ci.test_controllers.TestControllers.test_progress_type_request_completed_refreshes_baseline_status tests.test_ci.test_controllers.TestControllers.test_progress_type_request_completed_does_not_refresh_baseline_for_pull_requests tests.test_ci.test_controllers.TestControllers.test_comments_successfully_in_passed_pr_test tests.test_ci.test_controllers.TestControllers.test_comments_successfuly_in_failed_pr_test tests.test_ci.test_controllers.TestControllers.test_comment_info_handles_variant_files_correctly tests.test_ci.test_controllers.TestControllers.test_comment_info_handles_invalid_variants_correctly tests.test_ci.test_controllers.TestControllers.test_update_build_badge tests.test_regression.test_baseline_status
  • ./.venv/bin/python -m nose2 -v tests.test_ci.test_controllers tests.test_regression.test_baseline_status (still has one unrelated existing failure in this environment: test_start_ci_empty_token, caused by blocked api.github.com access before token validation)

Rahul-2k4 and others added 5 commits March 11, 2026 12:54
Adds a baseline_status field to RegressionTest with three states:
- unknown      - newly added test, no run history
- never_worked - has run history, but never passed on any CCExtractor version
- established  - has at least one passing run (a failure here is a regression)

This makes it possible to distinguish true regressions from samples that
have never produced correct output, addressing the 'Never worked' test
state goal from the Sample Platform NG project brief.

Includes:
- BaselineStatus DeclEnum in mod_regression/models.py
- RegressionTest.update_baseline_status(passed) state-machine method
- RegressionTest.is_regression property
- Flask-Migrate migration (d1f3a9c2e8b7) with upgrade/downgrade
- 12 unit tests covering all state transitions and DB persistence
Co-authored-by: Rahul-2k4 <216878448+Rahul-2k4@users.noreply.github.com>
Fix variable shadowing and missing docstring in never-worked baseline integration
@sonarqubecloud

Copy link
Copy Markdown

@Rahul-2k4 Rahul-2k4 changed the title Integrate never-worked baseline handling into CI reporting [FIX] Integrate never-worked baseline handling into CI reporting Mar 11, 2026
@cfsmp3

cfsmp3 commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Claude review:

Strengths (verified):

  • Trust gating is correct — baseline_status only mutates from completed main-repo commit runs, so PR/fork runs can't poison it.
  • Not affected by the C1 bug — I re-checked: it derives passed from the legacy get_test_results, whose missing-output detection is correct (flags test_error when expected outputs exist but no files). The C1 bug is in the new mod_api/status.py — a separate path. (Correcting my earlier comparison note, which wrongly tagged this.)
  • platform is properly wired to the template; comment_pr returns SUCCESS unless extra_failed_tests is non-empty, so never-worked tests correctly don't fail a PR.
  • Migration has a server_default + historical backfill.
  • Tests are thorough: all 6 transitions, is_regression ×3, DB persistence, refresh gating (skips PR runs), per-platform, comment bucketing. 208 pass.

Fix before merge:

  • I1 (med) — migration d1f3a9c2e8b7 revises head c8f3a2b1d4e5; multi-heads with [FEATURE] Added a "Never Worked" state to the tests #1071/[IMPROVEMENT]: Add ATSC XMLTV regression test coverage via Alembic migration #1003. Re-chain.
  • I2 (med) — verify the enum on MySQL: migration hardcodes sa.Enum(...name='baselinestatus') vs the model's BaselineStatus.db_type() (DeclEnum). SQLite tests hide any drift. Classic ORM/migration enum-mismatch risk.
  • I3 (med) — deploy-time gap: the backfill only marks established for tests that ever passed; never-passed tests stay unknown, and the never-worked rule requires != unknown. So right after deploy, genuinely-never-worked tests still land in extra_failed_tests (fail PRs) until the next main-repo commit run flips them to never_worked. Either document the one-cycle lag or add a conservative backfill (active tests with both last_passed_on_* NULL and a prior TestResult → never_worked).
  • I4/I5 (low) — get_test_results recomputed 2-3× per completed run; redundant double-guard.
  • I6 (low) — no "un-establish" (established stays established forever — intended, just document).

So if #1058 is the chosen never-worked implementation, it's close to ready — the work is migration hygiene (I1/I2) and the deploy backfill decision (I3), not core logic.

@cfsmp3 cfsmp3 self-requested a review June 25, 2026 06:06

@cfsmp3 cfsmp3 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] never-worked baseline handling is not fully integrated into trusted CI reporting

3 participants