Add manually-triggered workflow for expensive (live-LLM) tests by rajeee · Pull Request #433 · NatLabRockies/COMPASS

rajeee · 2026-05-23T00:23:44Z

Adds a workflow_dispatch-only workflow to run tests marked expensive (live, billable LLM calls) on demand from the Actions tab. Registers the expensive pytest marker, deselects it by default via addopts, and adds a tests-expensive pixi task.

Split out from #432 so the workflow lands on main first — workflow_dispatch workflows only become triggerable once the YAML is on the default branch. The tests it runs come with #432.

Requires AZURE_OPENAI_API_KEY and AZURE_OPENAI_ENDPOINT repo secrets to run (fails fast with a clear message otherwise).

Copilot

Pull request overview

This PR adds infrastructure to safely run opt-in “expensive” pytest tests (those that make live, billable LLM calls) via a manually triggered GitHub Actions workflow, while ensuring they are deselected by default in normal test runs.

Changes:

Registers an expensive pytest marker and deselects it by default via global addopts.
Adds a tests-expensive Pixi task to run only expensive-marked integration tests.
Introduces a workflow_dispatch-only GitHub Actions workflow to run these tests on demand with optional filtering.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
`pyproject.toml`	Adds `expensive` marker registration and deselects expensive tests by default via pytest `addopts`.
`pixi.toml`	Adds a `tests-expensive` task to run integration tests marked `expensive`.
`.github/workflows/expensive-tests.yml`	Adds a manual-only workflow to run expensive (live-LLM) tests with configurable model and optional test filtering.

+      - uses: prefix-dev/setup-pixi@1b2de7f3351f171c8b4dfeb558c639cb58ed4ec0 # v0.9.5
+        with:
+          pixi-version: v0.62.2
+          locked: true
+          cache: true
+          cache-write: false
+          environments: pdev


+          if [ -n "${{ github.event.inputs.test_filter }}" ]; then
+            pixi run -e pdev pytest -rapP -vv -s --log-cli-level=INFO \
+              -m expensive -k "${{ github.event.inputs.test_filter }}" \
+              tests/python/integration


+        env:
+          AZURE_OPENAI_API_KEY: ${{ secrets.AZURE_OPENAI_API_KEY }}
+          AZURE_OPENAI_ENDPOINT: ${{ secrets.AZURE_OPENAI_ENDPOINT }}
+          AZURE_OPENAI_VERSION: ${{ secrets.AZURE_OPENAI_VERSION }}
+          COMPASS_DATE_TEST_MODEL: ${{ github.event.inputs.model }}
+        run: |
+          if [ -z "${AZURE_OPENAI_API_KEY}" ] || [ -z "${AZURE_OPENAI_ENDPOINT}" ]; then
+            echo "::error::AZURE_OPENAI_API_KEY / AZURE_OPENAI_ENDPOINT secrets are not set."
+            echo "Add them in repo Settings -> Secrets and variables -> Actions."
+            exit 1
+          fi


+          pixi reinstall -e pdev INFRA-COMPASS
+          if [ -n "${{ github.event.inputs.test_filter }}" ]; then
+            pixi run -e pdev pytest -rapP -vv -s --log-cli-level=INFO \
+              -m expensive -k "${{ github.event.inputs.test_filter }}" \
+              tests/python/integration
+          else
+            pixi run -e pdev tests-expensive


codecov-commenter · 2026-05-23T00:26:42Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 56.11%. Comparing base (eab9df2) to head (aa15225).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #433   +/-   ##
=======================================
  Coverage   56.11%   56.11%           
=======================================
  Files          63       63           
  Lines        6080     6080           
  Branches      591      591           
=======================================
  Hits         3412     3412           
  Misses       2598     2598           
  Partials       70       70

Flag	Coverage Δ
unittests	`56.11% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Add manually-triggered workflow for expensive (live-LLM) tests

aa15225

rajeee requested review from castelao and ppinchuk as code owners May 23, 2026 00:23

Copilot AI review requested due to automatic review settings May 23, 2026 00:23

Copilot started reviewing on behalf of rajeee May 23, 2026 00:23 View session

Copilot AI reviewed May 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add manually-triggered workflow for expensive (live-LLM) tests#433

Add manually-triggered workflow for expensive (live-LLM) tests#433
rajeee wants to merge 1 commit into
mainfrom
expensive-test-workflow

rajeee commented May 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

codecov-commenter commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rajeee commented May 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

codecov-commenter commented May 23, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants