Skip to content

fix: Preserve nested struct references on readback#718

Open
gustavodiaz7722 wants to merge 7 commits into
aws-controllers-k8s:mainfrom
gustavodiaz7722:preserve-nested-references
Open

fix: Preserve nested struct references on readback#718
gustavodiaz7722 wants to merge 7 commits into
aws-controllers-k8s:mainfrom
gustavodiaz7722:preserve-nested-references

Conversation

@gustavodiaz7722

Copy link
Copy Markdown
Member

Description

Cross-resource references that live inside a nested struct field are dropped from the resource's .spec after the controller reads the resource back from AWS. References resolve correctly (ACK.ReferencesResolved=True) and the resolved value is applied to AWS, but the *Ref field disappears from the spec after readback.

Root cause

References are preserved across reconciles by the generated ClearResolvedReferences, which is called on latest in the runtime's patchResourceMetadataAndSpec. It only clears a concrete value when the sibling *Ref field is present on the resource.

Setting a resource's fields from an API response (SetResource) rebuilds nested struct fields from scratch and reassigns the parent struct:

f16 := &svcapitypes.LambdaConfigType{}   // fresh struct, no *Ref fields
// ...populate concrete fields from the response...
ko.Spec.LambdaConfig = f16               // parent replaced -> *Ref fields lost

Because the response carries only the concrete value (e.g. an ARN) and has no concept of the *Ref field, the *Ref values the user supplied are discarded. With no *Ref field on latest, ClearResolvedReferences can't clear the concrete value and the subsequent merge patch writes the concrete value while dropping the *Ref field.

Top-level references are unaffected, because the *Ref field is a sibling that is never touched when setting fields from the response. Nested references break because the whole parent struct is reassigned. This is the same class of issue seen with ECR encryptionConfiguration.kmsKeyRef, EKS computeConfig.nodeRoleRef, CloudFront distributionConfig.webACLRef, and OpenSearch cognitoOptions.*Ref.

Change

Add a PreserveReferenceFields code generator that emits code to copy nested *Ref values back from the original object into the rebuilt resource after the response is mapped, creating parent structs as needed:

if r.ko.Spec.LambdaConfig != nil && r.ko.Spec.LambdaConfig.PreSignUpRef != nil {
    if ko.Spec.LambdaConfig == nil {
        ko.Spec.LambdaConfig = &svcapitypes.LambdaConfigType{}
    }
    ko.Spec.LambdaConfig.PreSignUpRef = r.ko.Spec.LambdaConfig.PreSignUpRef
}

This makes nested references behave like top-level references: latest carries both the *Ref and the concrete value, so ClearResolvedReferences does its job and the patch preserves the reference.

The code is emitted for the create, update, read-one, and read-many output paths, guarded by HasReferenceFields. Top-level references are skipped (already preserved) and references nested within lists are skipped (their concrete values are restored by index during resolution).

Testing

  • go build ./..., gofmt, and go vet are clean.
  • Added Test_ReferenceFieldsPreservation_NestedReference; the full pkg/generate/... suite passes.
  • Regenerated a controller with nested struct references and confirmed correct output for single-level and deeply nested cases.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Cross-resource references nested inside struct fields were dropped from
the resource spec after the controller read the resource back from AWS.

References are preserved across reconciles by ClearResolvedReferences,
which only clears a concrete value when the sibling *Ref field is
present on the latest resource. Setting a resource's fields from an API
response rebuilds nested struct fields from scratch and reassigns the
parent struct, which discards the *Ref values the user supplied. The
response carries only the concrete value (e.g. an ARN) and has no
concept of the *Ref field, so the references were lost. Top-level
references are unaffected because the *Ref field is a sibling that is
never touched when setting fields from the response.

Generate code that copies nested *Ref values back from the original
object into the rebuilt resource after the response is mapped, creating
parent structs where needed. This makes nested references behave like
top-level references. The code is emitted for the create, update,
read-one, and read-many output paths. Top-level references are skipped
(already preserved) and references nested within lists are skipped
(their concrete values are restored by index during resolution).
@ack-prow ack-prow Bot requested review from jlbutler and knottnt June 29, 2026 20:36
@ack-prow

ack-prow Bot commented Jun 29, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: gustavodiaz7722
Once this PR has been reviewed and has the lgtm label, please assign jlbutler for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gustavodiaz7722

Copy link
Copy Markdown
Member Author

Manual Validation Results

Built this branch's ack-generate, regenerated a controller with a struct-nested reference, and verified end-to-end on a live EKS cluster (us-west-2) that the nested *Ref is now preserved across create, readback, and update.

Test subject

cognitoidentityprovider UserPool lambdaConfig.*Ref (struct-nested references to lambda.Function) — the same shape used in this PR's PreserveReferenceFields docstring. These fields failed before the fix (the *Ref was dropped from .spec and replaced with the concrete ARN after the Describe readback).

What the regeneration produced

Regenerating the controller with this branch added preservation blocks to pkg/resource/user_pool/sdk.go (~300 lines) for every lambdaConfig.*Ref, in both the readback path and the update path, e.g.:

// readback (sdkFind)
if r.ko.Spec.LambdaConfig != nil && r.ko.Spec.LambdaConfig.PreSignUpRef != nil {
    if ko.Spec.LambdaConfig == nil {
        ko.Spec.LambdaConfig = &svcapitypes.LambdaConfigType{}
    }
    ko.Spec.LambdaConfig.PreSignUpRef = r.ko.Spec.LambdaConfig.PreSignUpRef
}

// update (sdkUpdate)
if desired.ko.Spec.LambdaConfig != nil && desired.ko.Spec.LambdaConfig.PreSignUpRef != nil {
    ...
    ko.Spec.LambdaConfig.PreSignUpRef = desired.ko.Spec.LambdaConfig.PreSignUpRef
}

Results

Each field PASSes only if: ReferencesResolved=True, ResourceSynced=True, concrete field absent from .spec, and *Ref present with the right target.

Scenario preSignUpRef postConfirmationRef customMessageRef preAuthenticationRef postAuthenticationRef
Before fix (controller PR as-is)
After fix — create + readback
After fix — post-update

Create / readback (check_ref.py):

  ACK.ResourceSynced: True ✅
  ACK.ReferencesResolved: True ✅
  ✅ lambdaConfig.preSignUpRef: concrete=None, ref_name=ack-test-cognito-fn
  ✅ lambdaConfig.postConfirmationRef: concrete=None, ref_name=ack-test-cognito-fn
  ✅ lambdaConfig.customMessageRef: concrete=None, ref_name=ack-test-cognito-fn
  ✅ lambdaConfig.preAuthenticationRef: concrete=None, ref_name=ack-test-cognito-fn
  ✅ lambdaConfig.postAuthenticationRef: concrete=None, ref_name=ack-test-cognito-fn
RESULT: PASS ✅

Update path — changed a non-reference mutable field (policies.passwordPolicy.minimumLength 8 → 12) to force an UpdateUserPool, keeping the 5 refs. The update was applied (describe-user-pool shows MinimumLength: 12) and all refs survived:

  ACK.ResourceSynced: True ✅
  ACK.ReferencesResolved: True ✅
  ✅ lambdaConfig.preSignUpRef: concrete=None, ref_name=ack-test-cognito-fn
  ✅ lambdaConfig.postConfirmationRef: concrete=None, ref_name=ack-test-cognito-fn
  ✅ lambdaConfig.customMessageRef: concrete=None, ref_name=ack-test-cognito-fn
  ✅ lambdaConfig.preAuthenticationRef: concrete=None, ref_name=ack-test-cognito-fn
  ✅ lambdaConfig.postAuthenticationRef: concrete=None, ref_name=ack-test-cognito-fn
RESULT: PASS ✅

AWS-side confirmation (aws cognito-idp describe-user-pool) — the resolved values were applied and stayed applied after the update:

{
  "MinLen": 12,
  "Lambda": {
    "PreSignUp": "arn:aws:lambda:us-west-2:<acct>:function:ack-test-cognito-fn",
    "CustomMessage": "arn:aws:lambda:us-west-2:<acct>:function:ack-test-cognito-fn",
    "PostConfirmation": "arn:aws:lambda:us-west-2:<acct>:function:ack-test-cognito-fn",
    "PreAuthentication": "arn:aws:lambda:us-west-2:<acct>:function:ack-test-cognito-fn",
    "PostAuthentication": "arn:aws:lambda:us-west-2:<acct>:function:ack-test-cognito-fn"
  }
}

No perpetual diff after convergence ("desired resource state has changed" count = 0 over a 60s window once synced).

Scope note

I also regenerated elbv2 with this branch as a control: its generated code was unchanged, because its failing references (Listener.defaultActions[].targetGroupRef, Rule.actions[].targetGroupRef) are nested inside lists, which this fix intentionally skips (per the PreserveReferenceFields comments). So this PR resolves the struct-nested overwrite (cognito lambdaConfig, and by the same mechanism opensearch cognitoOptions, cloudfront distributionConfig, ECR encryptionConfiguration, etc.). List-nested references would need a separate follow-up.

Reproduction

  • Regenerated controller: SERVICE=cognitoidentityprovider AWS_SDK_GO_VERSION=v1.32.6 make build-controller
  • Built/pushed image with ko, deployed via Helm, created a lambda.Function + UserPool using only *Ref fields.
  • Verified with the standard check_ref.py (concrete absent + *Ref present + both conditions True).

Move the per-field *Ref restoration out of the inlined SDK templates and
into a single generated preserveReferenceFields method in references.go,
alongside the other reference passes (ResolveReferences,
ClearResolvedReferences). The create, update, read-one, and read-many
paths now call rm.preserveReferenceFields(<original>, ko) instead of
carrying a duplicated block, which keeps the generated sdk.go lean and
puts the logic next to the related reference helpers.

No behavior change: the generated copy logic is identical, only its
location and the source/target variable names (from/to) differ.
When the reference is set, its resolved concrete value was applied to
AWS and read back, so the target's ancestor structs already exist.
Instead of constructing empty ancestor structs on the target, nil-check
them in the guard condition and assign directly. This drops the
referenceParentGoType helper and its imports.
{{ $hookCode }}
{{- end }}
{{ GoCodeSetUpdateOutput .CRD "resp" "ko" 1 }}
{{- if .CRD.HasReferenceFields }}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a good number of resources that use a fully custom update function which won't be impacted by updating this template. Would it be possible to apply this in the ACK runtime?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, we should create a universal fix that will apply regardless if we have custom update function. I've refactored current implementation to use optional interface approach in runtime.

aws-controllers-k8s/runtime#257

{{ $hookCode }}
{{- end }}
{{ GoCodeSetReadOneOutput .CRD "resp" "ko" 1 }}
{{- if .CRD.HasReferenceFields }}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, some fields may be set in either in the setOutputCustomMethodName function or in a sdk_read_one_post_set_output hook.

Generate an exported PreserveReferenceValues method on the resource
manager instead of injecting the *Ref copy into each SDK output path.
The ACK runtime calls this method from its single spec-patch chokepoint,
which also covers resources with fully custom sdk implementations that
bypass the generated output-setting code (and so never reached the
previously injected call).

The method satisfies the runtime's optional ReferenceValuesPreserver
interface. Remove the per-template calls from the create, update,
read-one, and read-many paths.
When a resource has references but none are nested inside structs (all
top-level), the preservation body is empty and the generated method
declared `fromKO` without using it, failing to compile (e.g. the IAM
controller's Group, Role, User, and InstanceProfile resources).

Capture the generated body in the template and emit the method only when
it is non-empty. Resources with only top-level references no longer get
the method, which is correct: top-level references are preserved by the
caller's deep copy and the runtime simply skips the optional interface.
@gustavodiaz7722

Copy link
Copy Markdown
Member Author

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants