Skip to content

feat(eks): enhancements and fixes for EKS#766

Open
zdrapela wants to merge 3 commits intoredhat-developer:mainfrom
zdrapela:eks-enhancements
Open

feat(eks): enhancements and fixes for EKS#766
zdrapela wants to merge 3 commits intoredhat-developer:mainfrom
zdrapela:eks-enhancements

Conversation

@zdrapela
Copy link
Copy Markdown
Member

@zdrapela zdrapela commented Apr 7, 2026

Summary

Self-managed node groups with spot support

  • Replace EKS managed node group with a self-managed Auto Scaling Group (ASG) for direct spot price control
  • Add spotPrice parameter to set maximum bid for spot instances
  • Use API authentication mode with access entries for self-managed node authentication

Cluster reliability fixes

  • Add service CIDR to nodeadm NodeConfig (required by AL2023 for proper pod networking without DescribeCluster API calls)
  • Deploy addons in phased dependency order: infrastructure addons (vpc-cni, kube-proxy, eks-pod-identity-agent) → coredns → remaining addons (aws-ebs-csi-driver)
  • Add WaitForCapacityTimeout, HealthCheckType, and HealthCheckGracePeriod to ASG so Pulumi waits for nodes to be InService before deploying addons
  • Add ResolveConflictsOnCreate and extended timeouts to all EKS addons
  • Make LB controller depend on coredns for DNS resolution
  • Remove unused NAT gateway (NatGatewayModeNone) since EKS uses only public subnets

VPC endpoint extraction

  • Extract VPC endpoint creation from per-subnet code into a shared EndpointsRequest module
  • Endpoints are created once per VPC across all public subnets (required for multi-AZ EKS — AWS allows only one S3 gateway endpoint per VPC)
  • Integrates with the opt-in ServiceEndpoints pattern from feat(aws): Optional service endpoints #754

Other

  • Remove AWS CLI dependency from EKS cluster creation
  • Add resource tags to all EKS-specific AWS resources (cluster, IAM roles, OIDC provider, addons, ASG, etc.)
  • Extend EKS documentation

Resolves #499

@zdrapela zdrapela changed the title feat(eks): self-managed node groups with spot support and cluster reliability fixes feat(eks): self-managed node groups with spot support, cluster reliability fixes, and VPC endpoint extraction Apr 7, 2026
@zdrapela zdrapela changed the title feat(eks): self-managed node groups with spot support, cluster reliability fixes, and VPC endpoint extraction feat(eks): enhancements and fixes for EKS Apr 7, 2026
@zdrapela zdrapela marked this pull request as ready for review April 7, 2026 13:34
@zdrapela
Copy link
Copy Markdown
Member Author

zdrapela commented Apr 7, 2026

@adrianriobo Hi, I tested this PR on creating an EKS cluster, but I haven't tested any other infra creation, which may be affected.
If this PR is too big, I can split it.
I would create a Tekton task, but unfortunately, I don't have a place to test it. I created the Tekton task, but I still don't have where to test it.

@adrianriobo
Copy link
Copy Markdown
Collaborator

hey nice contribution, yeah I think most of the changes should not affect other targets but I want to give a try, in any case can you clean a bit the commits? basically you can either group EKS improvements and Networking improvements? WDYT?

zdrapela added 3 commits April 8, 2026 13:30
…ging

- Replace managed node group with self-managed ASG for spot price control
- Add tekton task for EKS cluster management
- Add resource tags to all EKS-specific AWS resources
- Resolve EKS cluster creation failures
- Fix EKS creation without AWS CLI
- Extend EKS documentation
@zdrapela
Copy link
Copy Markdown
Member Author

zdrapela commented Apr 8, 2026

Sure, I split it 👍

@adrianriobo
Copy link
Copy Markdown
Collaborator

@anjannath would you find time to review this one?

@adrianriobo adrianriobo requested a review from anjannath April 16, 2026 11:29
PublicSubnetsCIDRs: network.GeneratePublicSubnetCIDRs(len(r.availabilityZones)),
Region: *r.allocationData.Region,
NatGatewayMode: &network.NatGatewayModeSingle,
NatGatewayMode: &network.NatGatewayModeNone,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like can be put into its separate commit, it is changed to NatGatewayModeNone because MapPublicIp is true and that means a NAT gateway is not needed, we can include that information in the commit log

func createSelfManagedNodeGroup(ctx *pulumi.Context, args *selfManagedNodeGroupArgs) (*autoscaling.Group, error) {
// Look up EKS-optimized AL2023 AMI
eksAMI, err := ami.GetAMIByName(ctx,
fmt.Sprintf("amazon-eks-node-al2023-x86_64-standard-%s-*", args.kubernetesVersion),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the eks command also supports the --arch flag, here we need to set the arch substring in the AMI name, now it is static to x86_64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

EKS Enhancements

3 participants