Skip to content

Introduce fakes and contract tests for AWS Lambda integration#763

Draft
tooky wants to merge 10 commits intomainfrom
introduce-lambda-fakes-and-contracts
Draft

Introduce fakes and contract tests for AWS Lambda integration#763
tooky wants to merge 10 commits intomainfrom
introduce-lambda-fakes-and-contracts

Conversation

@tooky
Copy link
Copy Markdown
Contributor

@tooky tooky commented Apr 3, 2026

Context

Addresses #758. Our cloud provider integration tests all call real external services, causing flaky CI, slow feedback, and poor edge-case coverage. This PR establishes the pattern for fixing that, starting with AWS Lambda as the template.

The approach follows Steve's comment on #758: Fakes (not mocks) kept honest by contract tests that run the same suite against both the fake and the real implementation.

What this PR introduces

1. LambdaAPI interface (internal/aws/aws.go)

A narrow interface scoped to the two SDK methods we actually call:

type LambdaAPI interface {
    ListFunctions(ctx, params, optFns...) (*ListFunctionsOutput, error)
    GetFunctionConfiguration(ctx, params, optFns...) (*GetFunctionConfigurationOutput, error)
}

The real *lambda.Client satisfies this implicitly — no adapter needed. Internal helpers (getFilteredLambdaFuncs, getAndProcessOneLambdaFunc, getLambdaPackageDataFromClient) accept the interface instead of concrete types.

2. FakeLambdaClient (internal/aws/fake_lambda.go)

An in-memory implementation of LambdaAPI with:

  • Marker-based pagination (configurable page size)
  • Error responses for missing functions
  • GetFunctionConfigurationErr field for error injection in tests

3. Contract test suite (internal/aws/lambda_contract_test.go)

A shared runLambdaContractTests(t, client, existingFunctionName) function that exercises the behaviours we depend on:

  • ListFunctions returns results
  • Pagination via MaxItems + Marker
  • GetFunctionConfiguration for existing function returns config with CodeSha256 and LastModified
  • GetFunctionConfiguration for missing function returns error

This suite runs against both the fake (always) and real AWS (env-gated behind AWS_ACCESS_KEY_ID). If both pass, the fake is a trustworthy stand-in. If the real side drifts, the contract catches it.

4. Fake-backed unit tests (internal/aws/aws_test.go)

Filtering tests (11 cases): IncludeNames, IncludeNamesRegex, ExcludeNames, ExcludeNamesRegex, combined filters, multi-page pagination with filtering, empty results, invalid regex errors.

Orchestration tests (5 cases): Zip fingerprint decoding, Image raw CodeSha256, concurrent multi-function processing, empty function list, error propagation from GetFunctionConfiguration.

These all run without AWS credentials and complete in milliseconds.

5. Package-level factory for command-test injection (internal/aws/aws.go)

var NewLambdaClientFunc = defaultNewLambdaClient

func ResetLambdaClientFactory() { ... }

GetLambdaPackageData uses NewLambdaClientFunc instead of creating a client directly. This follows the existing codebase pattern (package-level globals for logger, kosliClient, global).

6. Command tests decoupled from AWS (cmd/kosli/snapshotLambda_test.go)

SetupTest injects a FakeLambdaClient seeded with two functions (cli-tests as Zip, cli-tests-docker as Image). TearDownTest resets the factory. All 14 test cases now run without AWS credentials — the requireAuthToBeSet / SkipIfEnvVarUnset pattern is removed entirely.

7. Trimmed integration tests

Lambda integration tests reduced from 8 cases to 3 focused smoke tests (invalid credentials, one Zip happy path, one Image happy path). Filtering and orchestration logic is now covered by fast fake-backed tests.

8. make test_smoke_aws Makefile target

Runs contract tests and smoke tests against real AWS — intended for use before releases:

make test_smoke_aws  # requires AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY

Test layers after this PR

Layer Runs when AWS creds needed Kosli server needed
Fake-backed unit tests (internal/aws/) Always No No
Contract tests — fake (internal/aws/) Always No No
Contract tests — real (internal/aws/) make test_smoke_aws Yes No
Smoke tests (internal/aws/) make test_smoke_aws Yes No
Command tests (cmd/kosli/) make test_integration No Yes

Next steps (tracked in TODO.md)

This PR establishes the pattern. The remaining cloud providers follow the same structure:

  1. ECS — highest complexity (nested clusters → services → tasks concurrency)
  2. S3 — paginator-based pattern differs from marker-based
  3. Azure Apps — multiple client interfaces (ARM + Container Registry)
  4. Docker — simplest (no pagination or concurrency)
  5. Kubernetes — semaphore-based concurrency, KIND cluster for contract tests

Each is broken down into slices in TODO.md.

How to review

The commits are structured as thin vertical slices — each is independently reviewable:

  1. 7416da85 — Pure refactoring: extract LambdaAPI interface, change signatures
  2. c9f70331 — Contract test suite wired to real AWS
  3. 9b892595FakeLambdaClient passes the same contract
  4. d3f7fa20 — 11 fake-backed filtering/pagination tests
  5. 4e9789f6 — 5 fake-backed orchestration tests
  6. 0c87bc5a — Trim integration tests from 8 → 3
  7. ef346fe0 — Factory injection into command tests, remove credential gating

🤖 Generated with Claude Code

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 3, 2026

Claude finished @tooky's task in 2m 50s —— View job


PR Review: Introduce fakes and contract tests for AWS Lambda integration

  • Phase 1: Static analysis (go build, go vet, go mod tidy) — all pass
  • Phase 2: Read and review changed files
  • Phase 3: Post inline comments on specific issues (6 inline comments)
  • Phase 4: Post summary review

Static Analysis

All clean — go build ./..., go vet ./..., and go mod tidy pass with no issues.

Summary

This is a well-structured PR that establishes a solid pattern for decoupling cloud provider tests from real AWS. The commit history reads like a textbook TDD progression — each slice is independently reviewable and leaves the codebase green. The test layer separation (contract, fake-backed unit, command, smoke) is well thought out.

What's done well

  • Contract test patternrunLambdaContractTests running against both fake and real implementations is the right way to keep fakes honest. This is a pattern worth replicating across ECS/S3/Azure.
  • Comprehensive test coverage — 11 filtering cases + 5 orchestration cases cover the important paths (pagination, regex, combined filters, concurrent processing, error propagation).
  • Clean interface extractionLambdaAPI is narrow (2 methods), the real *lambda.Client satisfies it implicitly, and the refactoring doesn't change any production behavior.
  • Command tests fully decoupledsnapshotLambda_test.go no longer needs AWS credentials, with proper TearDownTest cleanup.
  • make test_smoke_aws — good addition for pre-release validation against real AWS.

Issues found (all non-blocking)

I've posted 6 inline comments. The key items:

  1. Fake MaxItems handling (fake_lambda.go:33) — MaxItems is only respected when less than PageSize; the real API treats it as an upper bound regardless. Currently no impact but could cause subtle divergence.

  2. Contract test pagination fragility (lambda_contract_test.go:44) — The real-AWS pagination test will fail confusingly if the account ever has only 1 Lambda function. A t.Skip guard would help (echoing the earlier review).

  3. Error type fidelity (fake_lambda.go:79) — Fake returns fmt.Errorf instead of *types.ResourceNotFoundException. Fine as long as no production code type-asserts on the error, but worth a comment (echoing the earlier review).

  4. Package-level global concurrency (aws.go:156) — Safe today but worth keeping in mind as more providers adopt this factory pattern.

  5. Test helper could be extended (aws_test.go:599-614) — fakeLambdaClientWithFunctions always creates Zip types; orchestration tests build clients inline. Minor DRY opportunity.

Verdict

No blocking issues. The three items from the previous review (pagination guard, error fidelity comment, concurrency note) are still unresolved — all are non-blocking but worth addressing before this pattern is replicated to ECS/S3.

This PR establishes a strong foundation for the remaining cloud provider fakes tracked in TODO.md.


}, nil
}
}
return nil, fmt.Errorf("function not found: %s", *params.FunctionName)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit (non-blocking): The real AWS Lambda API returns a typed *types.ResourceNotFoundException for missing functions. This fake returns a plain fmt.Errorf. The contract test only asserts require.Error, so it won't catch this divergence.

This is fine as long as no production code type-asserts on the error. If that ever changes (e.g., errors.As(err, &types.ResourceNotFoundException{})), the fake would silently diverge. Worth a comment here or in the contract test to call out that error-type fidelity is intentionally not tested.

require.LessOrEqual(t, len(out.Functions), 1)

// Follow the marker to prove pagination works
require.NotNil(t, out.NextMarker, "expected NextMarker when more functions exist")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fragility note (non-blocking): This assertion assumes the real AWS account has at least 2 Lambda functions in eu-central-1. If someone deletes the second one, the real-AWS contract test will fail here with a confusing nil-pointer message. A guard like:

if out.NextMarker == nil {
    t.Skip("only 1 function in account; pagination not exercisable")
}

would make this more resilient to environment drift.


// NewLambdaClientFunc is the factory used by GetLambdaPackageData to create a
// LambdaAPI client. Tests can replace this to inject a FakeLambdaClient.
var NewLambdaClientFunc = defaultNewLambdaClient
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observation (non-blocking): Package-level mutable globals for test injection are pragmatic here but worth noting: if internal/aws tests ever run with t.Parallel() at the package level, concurrent writes to NewLambdaClientFunc would race. Currently safe because go test serialises tests within a package by default, and the TearDownTest in snapshotLambda_test.go properly resets it. Just something to keep in mind as more providers adopt this pattern.

tooky and others added 10 commits April 9, 2026 12:12
Define a narrow LambdaAPI interface scoped to the two SDK methods we
actually use (ListFunctions, GetFunctionConfiguration). Refactor
getFilteredLambdaFuncs, getAndProcessOneLambdaFunc, and the new
getLambdaPackageDataFromClient to accept the interface instead of
*lambda.Client. The public GetLambdaPackageData creates the real
client and delegates — command layer is untouched.

This is Slice 1 of the fakes & contract tests work (#758).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Shared runLambdaContractTests function exercises the behaviours we
depend on: listing functions, marker-based pagination, getting
function config, and error on missing function.

Wired to real *lambda.Client in TestLambdaContract_RealAWS, env-gated
behind AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY. This establishes the
contract grounded in real AWS behaviour — the fake (next slice) must
pass the same suite.

Slice 2 of fakes & contract tests work (#758).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
FakeLambdaClient is an in-memory implementation of LambdaAPI with
marker-based pagination and error responses for missing functions.
It passes the same runLambdaContractTests suite that validates the
real *lambda.Client, proving it is a trustworthy stand-in.

Slice 3 of fakes & contract tests work (#758).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests getFilteredLambdaFuncs with the FakeLambdaClient covering:
IncludeNames, IncludeNamesRegex, ExcludeNames, ExcludeNamesRegex,
combined exclude filters, multi-page pagination with filtering,
empty function lists, and invalid regex error handling.

These tests run without AWS credentials and complete in milliseconds.

Slice 4 of fakes & contract tests work (#758).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests getLambdaPackageDataFromClient with the FakeLambdaClient
covering: Zip fingerprint decoding, Image raw CodeSha256, concurrent
multi-function processing, empty function list, and error propagation
from GetFunctionConfiguration.

Also adds GetFunctionConfigurationErr field to FakeLambdaClient for
error injection in tests.

Slice 5 of fakes & contract tests work (#758).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove filtering-focused integration test cases (IncludeNamesRegex,
ExcludeNames, ExcludeNamesRegex, combined filters, wrong region,
invalid regex) — these are now covered by fake-backed unit tests in
TestGetFilteredLambdaFuncs and TestGetLambdaPackageDataFromClient.

Keep three smoke tests: invalid credentials error, one Zip function
happy path, and one Image function happy path. These prove real AWS
SDK wiring works without duplicating logic tests.

Reduces Lambda integration tests from 8 cases to 3.

Slice 6 of fakes & contract tests work (#758).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add NewLambdaClientFunc package-level factory to internal/aws.
GetLambdaPackageData uses the factory instead of creating a client
directly. Tests replace the factory to inject a FakeLambdaClient.

snapshotLambda_test.go now injects the fake in SetupTest and resets
in TearDownTest. All test cases run without AWS credentials — the
requireAuthToBeSet/SkipIfEnvVarUnset pattern is removed entirely.

Also adds make test_smoke_aws target for running contract and smoke
tests against real AWS before release.

Slice 7 of fakes & contract tests work (#758).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Lists the next steps for ECS, S3, Azure Apps, Docker, and Kubernetes
following the pattern established by the Lambda work: interface →
contract tests (real first) → fake → unit tests → factory injection
into command tests → trim integration tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tooky tooky force-pushed the introduce-lambda-fakes-and-contracts branch from f8b2aa0 to 2ff108a Compare April 9, 2026 11:13

func (f *FakeLambdaClient) ListFunctions(_ context.Context, params *lambda.ListFunctionsInput, _ ...func(*lambda.Options)) (*lambda.ListFunctionsOutput, error) {
pageSize := f.pageSize()
if params.MaxItems != nil && int(*params.MaxItems) < pageSize {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit (non-blocking): MaxItems is respected only when it's less than the configured page size. If MaxItems is larger than PageSize, the fake silently ignores it and caps at PageSize. The real AWS API respects MaxItems as an upper bound regardless of the default page size.

This won't affect current tests (the contract test uses MaxItems=1 with PageSize=1), but it could cause subtle divergence if future tests set MaxItems > PageSize. Consider:

Suggested change
if params.MaxItems != nil && int(*params.MaxItems) < pageSize {
if params.MaxItems != nil {
maxItems := int(*params.MaxItems)
if maxItems < pageSize {
pageSize = maxItems
}
}

Or equivalently just use min(pageSize, *params.MaxItems) — the current logic is functionally the same, just noting for future readers that the intent is "MaxItems is an upper bound".

require.LessOrEqual(t, len(out.Functions), 1)

// Follow the marker to prove pagination works
require.NotNil(t, out.NextMarker, "expected NextMarker when more functions exist")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fragility note (non-blocking): This assertion assumes the real AWS account has ≥2 Lambda functions. If someone deletes the extra function, the real-AWS contract test will fail with a confusing nil-pointer dereference. A guard would make this more resilient:

if out.NextMarker == nil {
    t.Skip("only 1 function in account; pagination not exercisable")
}

(Echoing the earlier review comment — just flagging it's still unresolved.)

}, nil
}
}
return nil, fmt.Errorf("function not found: %s", *params.FunctionName)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error fidelity (non-blocking): The real AWS SDK returns *types.ResourceNotFoundException for missing functions, but this returns a plain fmt.Errorf. The contract test only asserts require.Error, so this divergence is invisible.

This is fine as long as no production code type-asserts on the error. Consider adding a brief comment here (e.g. // Real AWS returns *types.ResourceNotFoundException) so future developers know this is an intentional simplification.

(Echoing earlier review — still unresolved.)


// NewLambdaClientFunc is the factory used by GetLambdaPackageData to create a
// LambdaAPI client. Tests can replace this to inject a FakeLambdaClient.
var NewLambdaClientFunc = defaultNewLambdaClient
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concurrency note (non-blocking, echoing earlier review): Package-level mutable global is pragmatic here, but worth noting: if internal/aws tests ever run with t.Parallel() at the package level, concurrent writes to NewLambdaClientFunc would race. Currently safe since go test serialises within a package by default, and TearDownTest resets it properly. As more providers adopt this pattern (ECS, S3, etc.), consider whether a sync.Mutex guard or constructor-injection approach would scale better.

Comment on lines +599 to +614
// helper to build a FakeLambdaClient with named functions for testing
func fakeLambdaClientWithFunctions(names ...string) *FakeLambdaClient {
fns := make([]types.FunctionConfiguration, len(names))
lastModified := "2024-01-15T10:30:00.000+0000"
codeSha256 := "abc123"
for i, name := range names {
n := name
fns[i] = types.FunctionConfiguration{
FunctionName: &n,
CodeSha256: &codeSha256,
LastModified: &lastModified,
PackageType: types.PackageTypeZip,
}
}
return &FakeLambdaClient{Functions: fns}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: The fakeLambdaClientWithFunctions helper always creates PackageTypeZip functions. Several TestGetLambdaPackageDataFromClient cases construct their own FakeLambdaClient inline to handle PackageTypeImage. Consider extending this helper with an optional package type parameter (or a companion fakeLambdaClientWithPackageType) to reduce the inline boilerplate in the orchestration tests. Not urgent — the inline construction is clear enough.

}

func (suite *SnapshotLambdaTestSuite) TearDownTest() {
aws.ResetLambdaClientFactory()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good practice: TearDownTest properly resets the factory after every test. This prevents test pollution even if a test panics (testify's suite runner handles deferred cleanup). Well done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant