New serverless pattern - Step Function to durable Lambda function#2975
New serverless pattern - Step Function to durable Lambda function#2975sin-ak wants to merge 2 commits intoaws-samples:mainfrom
Conversation
|
|
||
| ## When to Use This Pattern | ||
| Use this pattern when: | ||
| - Your Lambda function execution time exceeds 15 minutes and must be orchestrated by Step Functions |
There was a problem hiding this comment.
nit: the 15min is not a differentiator for SFN vs DF - I think the reasons for combining both are 1/ existing SFN experience/workflows and 2/ simplifying Lambda orchestration logic e.g., reducing the number of functions, calls, and complexity when building hybrid workflows
|
|
||
| Important: this application uses various AWS services and there are costs associated with these services after the Free Tier usage - please see the [AWS Pricing page](https://aws.amazon.com/pricing/) for details. You are responsible for any AWS costs incurred. No warranty is implied in this example. | ||
|
|
||
| ## When to Use This Pattern |
There was a problem hiding this comment.
note: I'm reading this section as "when to use this serverless pattern" but somehow the section seems to be more a DF vs SFN - consider streamlining this section to match the use case and refer to the user guide for more guidance
| async_durable_lambda_fn.add_to_role_policy( | ||
| iam.PolicyStatement( | ||
| actions=[ | ||
| "states:SendTaskSuccess", | ||
| "states:SendTaskFailure", | ||
| "states:SendTaskHeartbeat" | ||
| ], | ||
| resources=[f"arn:aws:states:{self.region}:{self.account}:stateMachine:*"] | ||
| ) | ||
| ) |
There was a problem hiding this comment.
The inline IAM policy granting states:SendTaskSuccess, states:SendTaskFailure, and states:SendTaskHeartbeat uses a wildcard resource (arn:aws:states:*:*:stateMachine:*). This grants the Lambda function permission to send task callbacks to any state machine in the account (and potentially cross-account). The policy should be scoped to the specific state machine created in this stack.
| async_durable_lambda_fn.add_to_role_policy( | |
| iam.PolicyStatement( | |
| actions=[ | |
| "states:SendTaskSuccess", | |
| "states:SendTaskFailure", | |
| "states:SendTaskHeartbeat" | |
| ], | |
| resources=[f"arn:aws:states:{self.region}:{self.account}:stateMachine:*"] | |
| ) | |
| ) | |
| async_durable_lambda_fn.add_to_role_policy( | |
| iam.PolicyStatement( | |
| actions=[ | |
| "states:SendTaskSuccess", | |
| "states:SendTaskFailure", | |
| "states:SendTaskHeartbeat" | |
| ], | |
| resources=[state_machine.state_machine_arn] | |
| ) | |
| ) |
|
|
||
| Announced at re:Invent 2025, [Lambda durable functions](https://docs.aws.amazon.com/lambda/latest/dg/durable-functions.html) introduce a checkpoint/replay mechanism that allows Lambda executions to run for up to one year, automatically recovering from interruptions. This pattern shows how to combine durable functions with Step Functions in a hybrid architecture: durable functions handle application-level logic within Lambda, while Step Functions coordinates the high-level workflow across multiple AWS services. | ||
|
|
||
| Learn more about this pattern at Serverless Land Patterns: << Add the live URL here >> |
There was a problem hiding this comment.
| Learn more about this pattern at Serverless Land Patterns: << Add the live URL here >> | |
| Learn more about this pattern at Serverless Land Patterns: [cdk-stepfunction-durable-lambda-function](https://serverlessland.com/patterns/cdk-stepfunction-durable-lambda-function) |
| * [Create an AWS account](https://portal.aws.amazon.com/gp/aws/developer/registration/index.html) if you do not already have one and log in. The IAM user that you use must have sufficient permissions to make necessary AWS service calls and manage AWS resources. | ||
| * [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) installed and configured | ||
| * [Git Installed](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) | ||
| * [AWS Cloud Development Kit](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html) (AWS CDK >= 2.240.0) Installed |
There was a problem hiding this comment.
| * [AWS Cloud Development Kit](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html) (AWS CDK >= 2.240.0) Installed | |
| * [AWS Cloud Development Kit](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html) (AWS CDK >= 2.240.0) Installed | |
| * [Python](https://www.python.org/downloads/) (3.12 or later) |
| @@ -0,0 +1,68 @@ | |||
| { | |||
| "title": "Step Functions to Lambda durable functions", | |||
There was a problem hiding this comment.
| "title": "Step Functions to Lambda durable functions", | |
| "title": "AWS Step Functions to AWS Lambda durable functions" |
| "introBox": { | ||
| "headline": "How it works", | ||
| "text": [ | ||
| "This pattern demonstrates how to integrate AWS Lambda Durable Functions into an AWS Step Functions workflow. ", |
There was a problem hiding this comment.
| "This pattern demonstrates how to integrate AWS Lambda Durable Functions into an AWS Step Functions workflow. ", | |
| "This pattern demonstrates how to integrate AWS Lambda durable Functions into an AWS Step Functions workflow. ", |
| # Send callback as the FINAL durable step | ||
| if task_token: | ||
| context.logger.info("Resuming Step Function by calling send_task_success with task_token") | ||
| context.step(send_sfn_task_success(task_token, response)) |
There was a problem hiding this comment.
The send_sfn_task_success step has no error handling for terminal failure. If this step fails after all retries are exhausted, the Lambda execution fails silently from the Step Functions state machine's perspective — neither SendTaskSuccess nor SendTaskFailure is called. The state machine stalls until the HeartbeatSeconds or task-level timeout fires, with no actionable error signal and potentially a long delay before the execution is unblocked.
as it is ok for the patterns to stick with general default behavior, we should be clear on good practices especially when multiple services are orchestrated by either 1/ explicitly documenting behavour and "we are not doing more because of, ..." or 2/ implement it simply (and optionally add some more explanations)
please use below as an example
| context.step(send_sfn_task_success(task_token, response)) | |
| mport boto3 | |
| sfn_client = boto3.client("stepfunctions") | |
| try: | |
| context.step(send_sfn_task_success(task_token, response)) | |
| except Exception as e: | |
| sfn_client.send_task_failure( | |
| taskToken=task_token, | |
| error=type(e).__name__, | |
| cause=str(e)[:256] # SFN cause field is capped at 32768 chars, trim as appropriate | |
| ) | |
| raise |
| * `cdk docs` open CDK documentation | ||
|
|
||
| ---- | ||
| Copyright 2025 Amazon.com, Inc. or its affiliates. All Rights Reserved. |
There was a problem hiding this comment.
| Copyright 2025 Amazon.com, Inc. or its affiliates. All Rights Reserved. | |
| Copyright 2026 Amazon.com, Inc. or its affiliates. All Rights Reserved. |
Issue #, if available: #2976
Description of changes:
This pattern demonstrates how to integrate AWS Lambda durable functions into an AWS Step Functions workflow. This pattern covers both the synchronous invocation (using default Request Response pattern) and asynchronous invocation (using the Step Function Wait for Callback with Task Token integration pattern) of the durable Lambda function. It addresses the challenge of running long-running Lambda functions (beyond 15 minutes) within a Step Functions orchestration, using asynchronous invocation and durable checkpointing.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.