Skip to content

Add manually-triggered workflow for expensive (live-LLM) tests#433

Open
rajeee wants to merge 1 commit into
mainfrom
expensive-test-workflow
Open

Add manually-triggered workflow for expensive (live-LLM) tests#433
rajeee wants to merge 1 commit into
mainfrom
expensive-test-workflow

Conversation

@rajeee
Copy link
Copy Markdown
Collaborator

@rajeee rajeee commented May 23, 2026

Adds a workflow_dispatch-only workflow to run tests marked expensive (live, billable LLM calls) on demand from the Actions tab. Registers the expensive pytest marker, deselects it by default via addopts, and adds a tests-expensive pixi task.

Split out from #432 so the workflow lands on main first — workflow_dispatch workflows only become triggerable once the YAML is on the default branch. The tests it runs come with #432.

Requires AZURE_OPENAI_API_KEY and AZURE_OPENAI_ENDPOINT repo secrets to run (fails fast with a clear message otherwise).

@rajeee rajeee requested review from castelao and ppinchuk as code owners May 23, 2026 00:23
Copilot AI review requested due to automatic review settings May 23, 2026 00:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds infrastructure to safely run opt-in “expensive” pytest tests (those that make live, billable LLM calls) via a manually triggered GitHub Actions workflow, while ensuring they are deselected by default in normal test runs.

Changes:

  • Registers an expensive pytest marker and deselects it by default via global addopts.
  • Adds a tests-expensive Pixi task to run only expensive-marked integration tests.
  • Introduces a workflow_dispatch-only GitHub Actions workflow to run these tests on demand with optional filtering.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
pyproject.toml Adds expensive marker registration and deselects expensive tests by default via pytest addopts.
pixi.toml Adds a tests-expensive task to run integration tests marked expensive.
.github/workflows/expensive-tests.yml Adds a manual-only workflow to run expensive (live-LLM) tests with configurable model and optional test filtering.

Comment on lines +36 to +42
- uses: prefix-dev/setup-pixi@1b2de7f3351f171c8b4dfeb558c639cb58ed4ec0 # v0.9.5
with:
pixi-version: v0.62.2
locked: true
cache: true
cache-write: false
environments: pdev
Comment on lines +57 to +60
if [ -n "${{ github.event.inputs.test_filter }}" ]; then
pixi run -e pdev pytest -rapP -vv -s --log-cli-level=INFO \
-m expensive -k "${{ github.event.inputs.test_filter }}" \
tests/python/integration
Comment on lines +45 to +55
env:
AZURE_OPENAI_API_KEY: ${{ secrets.AZURE_OPENAI_API_KEY }}
AZURE_OPENAI_ENDPOINT: ${{ secrets.AZURE_OPENAI_ENDPOINT }}
AZURE_OPENAI_VERSION: ${{ secrets.AZURE_OPENAI_VERSION }}
COMPASS_DATE_TEST_MODEL: ${{ github.event.inputs.model }}
run: |
if [ -z "${AZURE_OPENAI_API_KEY}" ] || [ -z "${AZURE_OPENAI_ENDPOINT}" ]; then
echo "::error::AZURE_OPENAI_API_KEY / AZURE_OPENAI_ENDPOINT secrets are not set."
echo "Add them in repo Settings -> Secrets and variables -> Actions."
exit 1
fi
Comment on lines +56 to +62
pixi reinstall -e pdev INFRA-COMPASS
if [ -n "${{ github.event.inputs.test_filter }}" ]; then
pixi run -e pdev pytest -rapP -vv -s --log-cli-level=INFO \
-m expensive -k "${{ github.event.inputs.test_filter }}" \
tests/python/integration
else
pixi run -e pdev tests-expensive
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 56.11%. Comparing base (eab9df2) to head (aa15225).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #433   +/-   ##
=======================================
  Coverage   56.11%   56.11%           
=======================================
  Files          63       63           
  Lines        6080     6080           
  Branches      591      591           
=======================================
  Hits         3412     3412           
  Misses       2598     2598           
  Partials       70       70           
Flag Coverage Δ
unittests 56.11% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants