Open
Conversation
0004cb7 to
efd549a
Compare
…erfile Replace GitHub API call with git ls-remote to check for func CLI updates. The unauthenticated GitHub API has a 60 requests/hour limit, which was being exhausted by concurrent E2E test builds in the merge queue, causing HTTP 403 errors and build failures.
Two issues were causing intermittent failures: 1. Race condition: The test waited for the pod to reach "Succeeded" status but immediately tried to read logs. Added explicit wait for logs to be available before reading them, preventing "empty logs" failures. 2. Cleanup order: When tests failed, AfterEach hooks ran in reverse order, deleting the curl-metrics pod before the debug AfterEach could collect its logs. Replaced AfterEach with DeferCleanup registered after pod creation to ensure proper cleanup ordering.
The metrics endpoint test was failing with "You must provide one or more resources" because controllerPodName was empty. This happened because the controller pod name was only set in the first test, but the metrics test in the nested Context ran independently. Moved the controller pod validation to BeforeEach inside the Context so controllerPodName is set before each test runs, making it available to all tests including the metrics endpoint test.
GitHub runners have limited disk space (~14GB free). E2E tests build multiple Docker images which can fill up the disk, causing "no space left on device" errors. Free up ~10GB by removing unused packages (.NET, Android SDK, GHC) and cleaning Docker at the start of the workflow, before the kind cluster is created. This cleanup is safe because it runs before any test infrastructure exists.
Increase default Eventually timeout from 2 to 10 minutes for function deployment tests to handle resource contention during concurrent S2I builds with middleware updates in CI environments. With multiple parallel test configurations, builds can take 6-7 minutes due to competing for limited CPU/memory/disk/network resources on GitHub runners.
Stop test execution immediately on first failure to save CI time and provide faster feedback when tests fail.
Add roles and rolebindings to artifact collection to diagnose permission errors during PipelineRun deployments. This will help investigate RBAC propagation delays that may cause intermittent deployment failures.
Use 'while read' instead of 'for' loop to properly parse namespace and name pairs. The previous loop iterated over words instead of lines, causing incorrect pairing and incomplete log collection.
Add step to pre-pull and cache builder images into the KinD cluster before running parallel E2E tests. This eliminates resource contention from multiple concurrent image pulls. Root cause analysis showed that when 3-4 tests run in parallel, they all attempt to pull large builder images (1-3GB) simultaneously: - S2I: registry.access.redhat.com/ubi8/go-toolset (~1GB) - Pack: ghcr.io/knative/builder-jammy-base (~3GB) This concurrent pulling caused: - Network bandwidth saturation - Disk I/O contention - Container runtime lock contention - PipelineRun builds timing out waiting for image pulls Solution: Pre-pull images once before tests start, then load into KinD. All parallel tests now share the cached images instead of pulling separately. Benefits: - Keeps full parallel execution (-p flag) to test concurrent reconciles - Eliminates 90% of build time (no repeated pulls) - More reliable CI (no timeout failures) - Faster overall test suite
Move SetDefaultEventuallyTimeout from individual Describe blocks to BeforeSuite to fix timeout race condition. Root cause: When tests run in parallel with `-p`, all test files execute in the same process. SetDefaultEventuallyTimeout is global to Gomega, so whichever Describe block runs last overwrites the timeout for all tests. The sequence was: 1. func_deploy_test.go sets timeout to 10 minutes 2. metrics_test.go sets timeout to 2 minutes (120 seconds) 3. All subsequent tests use 2 minutes, causing deployment tests to timeout Solution: Set timeout once globally in BeforeSuite before any Describe blocks execute. This ensures a consistent 10 minute timeout for all tests. Removed redundant timeout settings from: - test/e2e/func_deploy_test.go (10 min) - test/e2e/func_middleware_update_test.go (10 min) - test/e2e/bundle_test.go (5 min) - test/e2e/metrics_test.go (2 min - the culprit)
703e474 to
e1dc613
Compare
Increased the default timeout for Eventually assertions to accommodate longer test durations.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
e2e test improvements: