Skip to content

chore e2e tests#60

Open
creydr wants to merge 24 commits intofunctions-dev:mainfrom
creydr:rework-e2e-tests
Open

chore e2e tests#60
creydr wants to merge 24 commits intofunctions-dev:mainfrom
creydr:rework-e2e-tests

Conversation

@creydr
Copy link
Copy Markdown
Collaborator

@creydr creydr commented Apr 15, 2026

e2e test improvements:

  • run e2e tests in parallel
  • Cleanup disk before running e2e tests
  • enable fail-fast to safe CI time
  • cleanup resources (DeferCleanup()) only on success
  • collect multiple artifacts on test failures (pipeline logs, operator logs, status of all deployments/functions/configmaps)
  • Fix GH API rate limiting issue in image build (68cb917)

@creydr creydr enabled auto-merge April 15, 2026 13:42
@creydr creydr force-pushed the rework-e2e-tests branch from 0004cb7 to efd549a Compare April 16, 2026 07:28
@creydr creydr disabled auto-merge April 16, 2026 09:36
creydr added 20 commits April 16, 2026 14:57
…erfile

Replace GitHub API call with git ls-remote to check for func CLI updates.
The unauthenticated GitHub API has a 60 requests/hour limit, which was
being exhausted by concurrent E2E test builds in the merge queue, causing
HTTP 403 errors and build failures.
Two issues were causing intermittent failures:

1. Race condition: The test waited for the pod to reach "Succeeded" status
   but immediately tried to read logs. Added explicit wait for logs to be
   available before reading them, preventing "empty logs" failures.

2. Cleanup order: When tests failed, AfterEach hooks ran in reverse order,
   deleting the curl-metrics pod before the debug AfterEach could collect
   its logs. Replaced AfterEach with DeferCleanup registered after pod
   creation to ensure proper cleanup ordering.
The metrics endpoint test was failing with "You must provide one or more
resources" because controllerPodName was empty. This happened because the
controller pod name was only set in the first test, but the metrics test
in the nested Context ran independently.

Moved the controller pod validation to BeforeEach inside the Context so
controllerPodName is set before each test runs, making it available to
all tests including the metrics endpoint test.
GitHub runners have limited disk space (~14GB free). E2E tests build multiple
Docker images which can fill up the disk, causing "no space left on device"
errors.

Free up ~10GB by removing unused packages (.NET, Android SDK, GHC) and
cleaning Docker at the start of the workflow, before the kind cluster is
created. This cleanup is safe because it runs before any test infrastructure
exists.
Increase default Eventually timeout from 2 to 10 minutes for function
deployment tests to handle resource contention during concurrent S2I
builds with middleware updates in CI environments.

With multiple parallel test configurations, builds can take 6-7 minutes
due to competing for limited CPU/memory/disk/network resources on
GitHub runners.
Stop test execution immediately on first failure to save CI time
and provide faster feedback when tests fail.
Add roles and rolebindings to artifact collection to diagnose
permission errors during PipelineRun deployments. This will help
investigate RBAC propagation delays that may cause intermittent
deployment failures.
Use 'while read' instead of 'for' loop to properly parse namespace
and name pairs. The previous loop iterated over words instead of
lines, causing incorrect pairing and incomplete log collection.
Add step to pre-pull and cache builder images into the KinD cluster before
running parallel E2E tests. This eliminates resource contention from multiple
concurrent image pulls.

Root cause analysis showed that when 3-4 tests run in parallel, they all
attempt to pull large builder images (1-3GB) simultaneously:
- S2I: registry.access.redhat.com/ubi8/go-toolset (~1GB)
- Pack: ghcr.io/knative/builder-jammy-base (~3GB)

This concurrent pulling caused:
- Network bandwidth saturation
- Disk I/O contention
- Container runtime lock contention
- PipelineRun builds timing out waiting for image pulls

Solution: Pre-pull images once before tests start, then load into KinD.
All parallel tests now share the cached images instead of pulling separately.

Benefits:
- Keeps full parallel execution (-p flag) to test concurrent reconciles
- Eliminates 90% of build time (no repeated pulls)
- More reliable CI (no timeout failures)
- Faster overall test suite
Move SetDefaultEventuallyTimeout from individual Describe blocks to
BeforeSuite to fix timeout race condition.

Root cause: When tests run in parallel with `-p`, all test files execute
in the same process. SetDefaultEventuallyTimeout is global to Gomega, so
whichever Describe block runs last overwrites the timeout for all tests.

The sequence was:
1. func_deploy_test.go sets timeout to 10 minutes
2. metrics_test.go sets timeout to 2 minutes (120 seconds)
3. All subsequent tests use 2 minutes, causing deployment tests to timeout

Solution: Set timeout once globally in BeforeSuite before any Describe
blocks execute. This ensures a consistent 10 minute timeout for all tests.

Removed redundant timeout settings from:
- test/e2e/func_deploy_test.go (10 min)
- test/e2e/func_middleware_update_test.go (10 min)
- test/e2e/bundle_test.go (5 min)
- test/e2e/metrics_test.go (2 min - the culprit)
@creydr creydr force-pushed the rework-e2e-tests branch from 703e474 to e1dc613 Compare April 16, 2026 13:59
@creydr creydr enabled auto-merge April 16, 2026 14:38
Increased the default timeout for Eventually assertions to accommodate longer test durations.
@creydr creydr added this pull request to the merge queue Apr 16, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 16, 2026
@creydr creydr enabled auto-merge April 16, 2026 19:03
@creydr creydr added this pull request to the merge queue Apr 16, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant