Skip to content

ci: Improve benchmarking tests with CodSpeed#236

Open
Copilot wants to merge 27 commits intomainfrom
copilot/improve-benchmarking-tests
Open

ci: Improve benchmarking tests with CodSpeed#236
Copilot wants to merge 27 commits intomainfrom
copilot/improve-benchmarking-tests

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 25, 2026

Summary

Replace pytest-benchmark with pytest-codspeed for benchmarking, extend test coverage to full process lifecycle and run-only benchmarks, and parameterize across connector types.

Changes

  • pyproject.toml: Swap pytest-benchmarkpytest-codspeed>=4.4.0
  • tests/benchmark/test_benchmarking.py:
    • test_benchmark_process_lifecycle — benchmarks full init → run → destroy using @pytest.mark.benchmark
    • test_benchmark_process_run — benchmarks only process.run() using BenchmarkFixture.pedantic with setup/teardown callbacks and uvloop.run, excluding init and destroy from timing
    • Both parameterized over AsyncioConnector/LocalProcess, ZMQConnector/LocalProcess, and RayConnector/RayProcess
  • tests/benchmark/conftest.py: Added benchmark-specific ray_ctx fixture that overrides the session-scoped one from tests/conftest.py, using include_dashboard=False to prevent the Ray dashboard's MetricsHead module from timing out under CodSpeed instrumentation in CI.
  • .github/workflows/benchmarks.yaml: Replace manual checkout-both-branches-and-compare workflow with CodSpeedHQ/action@v4 on ubuntu-latest using default mode. Adds push to main and workflow_dispatch triggers for baseline tracking and backtesting. Uses OIDC auth (id-token: write). Adds RAY_ENABLE_UV_RUN_RUNTIME_ENV and PLUGBOARD_IO_READ_TIMEOUT env vars for Ray-based benchmarks.
  • uv.lock: Updated to reflect dependency change

- Replace pytest-benchmark with pytest-codspeed in test dependencies
- Rewrite benchmark tests to use @pytest.mark.benchmark decorator
- Add full process lifecycle benchmark (init + run + destroy)
- Parameterize benchmarks across AsyncioConnector and ZMQConnector
- Replace GitHub Actions workflow with CodSpeed integration

Co-authored-by: toby-coleman <13170610+toby-coleman@users.noreply.github.com>
Agent-Logs-Url: https://github.com/plugboard-dev/plugboard/sessions/b8d70f1d-7f6f-412b-9452-e70611a349de
Copilot AI changed the title [WIP] Enhance benchmarking tests using codspeed ci: Improve benchmarking tests with CodSpeed Mar 25, 2026
Copilot AI requested a review from toby-coleman March 25, 2026 07:47
Comment thread tests/benchmark/test_benchmarking.py Outdated
Comment thread tests/benchmark/test_benchmarking.py Outdated
…tion

- test_benchmark_process_run now uses benchmark.pedantic with setup to
  only time process.run(), excluding init
- Added RayConnector/RayProcess to both benchmark test parametrizations
- Added Ray env vars to CI workflow for benchmark tests

Co-authored-by: toby-coleman <13170610+toby-coleman@users.noreply.github.com>
Agent-Logs-Url: https://github.com/plugboard-dev/plugboard/sessions/2d017dc2-a948-4b46-8c61-c42be32507a4
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Mar 25, 2026

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • metadata.google.internal
    • Triggering command: /home/REDACTED/work/plugboard/plugboard/.venv/bin/python /home/REDACTED/work/plugboard/plugboard/.venv/bin/python /home/REDACTED/work/plugboard/plugboard/.venv/lib/python3.12/site-packages/ray/dashboard/dashboard.py --host=127.0.0.1 --port=8265 --port-retries=50 --temp-dir=/tmp/ray --log-dir=/tmp/ray/session_2026-03-25_20-28-43_333217_4102/logs --session-dir=/tmp/ray/session_2026-03-25_20-28-43_333217_4102 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:33279 --cluster-id-hex=9ec9fda9fcb85e65d5dc26651d1db3efb8d117964fb0bb2db1d9c416 --node-ip-address=127.0.0.1 --stdout-filepath=/tmp/ray/session_2026-03-25_20-28-43_333217_4102/logs/dashboard.out --stderr-filepath=/tmp/ray/session_2026-03-25_20-28-43_333217_4102/logs/dashboard.err (dns block)
    • Triggering command: /home/REDACTED/work/plugboard/plugboard/.venv/bin/python /home/REDACTED/work/plugboard/plugboard/.venv/bin/python /home/REDACTED/work/plugboard/plugboard/.venv/lib/python3.12/site-packages/ray/dashboard/dashboard.py --host=127.0.0.1 --port=8265 --port-retries=50 --temp-dir=/tmp/ray --log-dir=/tmp/ray/session_2026-03-25_20-33-01_962890_5585/logs --session-dir=/tmp/ray/session_2026-03-25_20-33-01_962890_5585 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:37307 --cluster-id-hex=6c37b0969d30736700cb2e0a3ff6b3b023662ffa335ba1d9fb65db86 --node-ip-address=127.0.0.1 --stdout-filepath=/tmp/ray/session_2026-03-25_20-33-01_962890_5585/logs/dashboard.out --stderr-filepath=/tmp/ray/session_2026-03-25_20-33-01_962890_5585/logs/dashboard.err cal/bin/git hub.com/.extrahe/home/REDACTED/.cache/uv/builds-v0/.tmpycA9Bm/bin/python (dns block)
    • Triggering command: /home/REDACTED/work/plugboard/plugboard/.venv/bin/python3 /home/REDACTED/work/plugboard/plugboard/.venv/bin/python3 /home/REDACTED/work/plugboard/plugboard/.venv/lib/python3.12/site-packages/ray/dashboard/dashboard.py --host=127.0.0.1 --port=8265 --port-retries=50 --temp-dir=/tmp/ray --log-dir=/tmp/ray/session_2026-03-25_20-34-45_613600_6660/logs --session-dir=/tmp/ray/session_2026-03-25_20-34-45_613600_6660 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:34471 --cluster-id-hex=e65bcc378675174b4494fed2ef08f5c8492840459a2b9e0ece8361a3 --node-ip-address=127.0.0.1 --stdout-filepath=/tmp/ray/session_2026-03-25_20-34-45_613600_6660/logs/dashboard.out --stderr-filepath=/tmp/ray/session_2026-03-25_20-34-45_613600_6660/logs/dashboard.err python http.https://gitbash (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot AI requested a review from toby-coleman March 25, 2026 20:42
@toby-coleman toby-coleman marked this pull request as ready for review March 25, 2026 20:50
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Mar 25, 2026

Congrats! CodSpeed is installed 🎉

🆕 6 new benchmarks were detected.

You will start to see performance impacts in the reports once the benchmarks are run from your default branch.

Detected benchmarks


Open in CodSpeed

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@toby-coleman
Copy link
Copy Markdown
Contributor

Wait for CodSpeedHQ/pytest-codspeed#113 before merging this.

@toby-coleman
Copy link
Copy Markdown
Contributor

@copilot Reinstate the run-only tests - take a look at c663755 to see what to add back in. Update the version of pytest-codespeed to >=4.4.0

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 17, 2026

@copilot Reinstate the run-only tests - take a look at c663755 to see what to add back in. Update the version of pytest-codespeed to >=4.4.0

Reinstated the run-only tests based on c663755 and updated pytest-codspeed to >=4.4.0 in 85362c6. Also fixed the _run function signature (removed unused process parameter) and added a teardown callback for proper cleanup in f585742.

@toby-coleman
Copy link
Copy Markdown
Contributor

@copilot change the benchmark tests back to using ubuntu-latest rather than codspeed macro, and don't use walltime mode - go back to default

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 19, 2026

@copilot change the benchmark tests back to using ubuntu-latest rather than codspeed macro, and don't use walltime mode - go back to default

Done in 29feb10 — switched runs-on back to ubuntu-latest and removed mode: walltime so CodSpeed uses its default instrumentation mode.

@toby-coleman
Copy link
Copy Markdown
Contributor

@copilot determine why benchmark tests in CI fail with RuntimeError: Module MetricsHead failed to start. Timeout after 30.0 seconds related to ray dashboard. Fix.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 19, 2026

@copilot determine why benchmark tests in CI fail with RuntimeError: Module MetricsHead failed to start. Timeout after 30.0 seconds related to ray dashboard. Fix.

Fixed in 76af665. The ray_ctx session fixture in tests/conftest.py uses include_dashboard=True, which causes the Ray dashboard's MetricsHead module to time out under CodSpeed's instrumented environment. Added a benchmark-specific tests/benchmark/conftest.py that overrides ray_ctx with include_dashboard=False. Also removed the explicit mode: simulation from the workflow (it was leftover from a previous change).

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • astral.sh
    • Triggering command: /usr/bin/curl curl -LsSf REDACTED /rg (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants