diff --git a/AGENTS.md b/AGENTS.md index f3b58332..e64b6b62 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -926,9 +926,4 @@ Resources: **Python SDK:** - [SDK Repository](https://github.com/aws/aws-durable-execution-sdk-python) -- [Documentation Index](https://github.com/aws/aws-durable-execution-sdk-python/blob/main/docs/index.md) -- [Getting Started](https://github.com/aws/aws-durable-execution-sdk-python/blob/main/docs/getting-started.md) -- [Steps](https://github.com/aws/aws-durable-execution-sdk-python/blob/main/docs/core/steps.md) -- [Wait Operations](https://github.com/aws/aws-durable-execution-sdk-python/blob/main/docs/core/wait.md) -- [Callbacks](https://github.com/aws/aws-durable-execution-sdk-python/blob/main/docs/core/callbacks.md) -- [Testing Patterns](https://github.com/aws/aws-durable-execution-sdk-python/blob/main/docs/testing-patterns/basic-tests.md) +- [AWS Durable Execution Documentation](https://docs.aws.amazon.com/durable-execution/) diff --git a/README.md b/README.md index 5b650d94..3a772717 100644 --- a/README.md +++ b/README.md @@ -68,33 +68,10 @@ def handler(event: dict, context: DurableContext) -> dict: ## πŸ“š Documentation -- **[AWS Documentation](https://docs.aws.amazon.com/lambda/latest/dg/durable-functions.html)** - Official AWS Lambda durable functions guide -- **[Documentation index](docs/index.md)** - SDK Overview and navigation - -**New to durable functions?** -- [Getting started guide](docs/getting-started.md) - Build your first durable function - -**Core operations:** -- [Steps](docs/core/steps.md) - Execute code with automatic checkpointing and retry support -- [Wait operations](docs/core/wait.md) - Pause execution without blocking Lambda resources -- [Callbacks](docs/core/callbacks.md) - Wait for external systems to respond -- [Invoke operations](docs/core/invoke.md) - Call other durable functions and compose workflows -- [Child contexts](docs/core/child-contexts.md) - Organize complex workflows into isolated units -- [Parallel operations](docs/core/parallel.md) - Run multiple operations concurrently -- [Map operations](docs/core/map.md) - Process collections in parallel with batching -- [Logger integration](docs/core/logger.md) - Add structured logging to track execution - -**Advanced topics:** -- [Error handling](docs/advanced/error-handling.md) - Handle failures and implement retry strategies -- [Testing modes](docs/advanced/testing-modes.md) - Run tests locally or against deployed Lambda functions -- [Testing patterns](docs/testing-patterns/basic-tests.md) - Practical testing examples -- [Serialization](docs/advanced/serialization.md) - Customize how data is serialized in checkpoints - -**Architecture:** -- [Architecture diagrams](docs/architecture.md) - Class diagrams and concurrency flows - -**API reference:** -- API reference docs are in progress. Use the core operation docs above for now. +The complete documentation for the AWS Durable Execution SDK for Python lives on the AWS Documentation site: + +- **[AWS Durable Execution Documentation](https://docs.aws.amazon.com/durable-execution/)** - Concepts, getting started, core operations, advanced topics, and API reference +- **[AWS Lambda Durable Functions Guide](https://docs.aws.amazon.com/lambda/latest/dg/durable-functions.html)** - How durable functions work on Lambda ## πŸ’¬ Feedback & Support diff --git a/docs/advanced/error-handling.md b/docs/advanced/error-handling.md deleted file mode 100644 index 59f002a9..00000000 --- a/docs/advanced/error-handling.md +++ /dev/null @@ -1,955 +0,0 @@ -# Error Handling - -## Table of Contents - -- [Overview](#overview) -- [Terminology](#terminology) -- [Getting started](#getting-started) -- [Exception types](#exception-types) -- [Retry strategies](#retry-strategies) -- [Error response formats](#error-response-formats) -- [Common error scenarios](#common-error-scenarios) -- [Troubleshooting](#troubleshooting) -- [Best practices](#best-practices) -- [FAQ](#faq) -- [Testing](#testing) -- [See also](#see-also) - -[← Back to main index](../index.md) - -## Overview - -Error handling in durable functions determines how your code responds to failures. The SDK provides typed exceptions, automatic retry with exponential backoff, and AWS-compliant error responses that help you build resilient workflows. - -When errors occur, the SDK can: -- Retry transient failures automatically with configurable backoff -- Checkpoint failures with detailed error information -- Distinguish between recoverable and unrecoverable errors -- Provide clear termination reasons and stack traces for debugging - -[↑ Back to top](#table-of-contents) - -## Terminology - -**Exception** - A Python error that interrupts normal execution flow. The SDK provides specific exception types for different failure scenarios. - -**Retry strategy** - A function that determines whether to retry an operation after an exception and how long to wait before retrying. - -**Termination reason** - A code indicating why a durable execution terminated, such as `UNHANDLED_ERROR` or `INVOCATION_ERROR`. - -**Recoverable error** - An error that can be retried, such as transient network failures or rate limiting. - -**Unrecoverable error** - An error that terminates execution immediately without retry, such as validation errors or non-deterministic execution. - -**Backoff** - The delay between retry attempts, typically increasing exponentially to avoid overwhelming failing services. - -[↑ Back to top](#table-of-contents) - -## Getting started - -Here's a simple example of handling errors in a durable function: - -```python -from aws_durable_execution_sdk_python import ( - DurableContext, - durable_execution, - durable_step, - StepContext, -) - -@durable_step -def process_order(step_context: StepContext, order_id: str) -> dict: - """Process an order with validation.""" - if not order_id: - raise ValueError("Order ID is required") - - # Process the order - return {"order_id": order_id, "status": "processed"} - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Handle order processing with error handling.""" - try: - order_id = event.get("order_id") - result = context.step(process_order(order_id)) - return result - except ValueError as e: - # Handle validation errors from your code - return {"error": "InvalidInput", "message": str(e)} -``` - -When this function runs: -1. If `order_id` is missing, `ValueError` is raised from your code -2. The exception is caught and handled gracefully -3. A structured error response is returned to the caller - -[↑ Back to top](#table-of-contents) - -## Exception types - -The SDK provides several exception types for different failure scenarios. - -### Exception summary - -| Exception | Retryable | Behavior | Use case | -|-----------|-----------|----------|----------| -| `ValidationError` | No | Fails immediately | SDK detects invalid arguments | -| `ExecutionError` | No | Returns FAILED status | Permanent business logic failures | -| `InvocationError` | Yes (by Lambda) | Lambda retries invocation | Transient infrastructure issues | -| `CallbackError` | No | Returns FAILED status | Callback handling failures | -| `StepInterruptedError` | Yes (automatic) | Retries on next invocation | Step interrupted before checkpoint | -| `CheckpointError` | Depends | Permanent on 4xx non-429 (except invalid checkpoint token); retries otherwise | Failed to save execution state | -| `SerDesError` | No | Returns FAILED status | Serialization failures | - -### Base exceptions - -**DurableExecutionsError** - Base class for all SDK exceptions. - -```python -from aws_durable_execution_sdk_python import DurableExecutionsError - -try: - # Your code here - pass -except DurableExecutionsError as e: - # Handle any SDK exception - print(f"SDK error: {e}") -``` - -**UnrecoverableError** - Base class for errors that terminate execution. These errors include a `termination_reason` attribute. - -```python -from aws_durable_execution_sdk_python import ( - ExecutionError, - InvocationError, -) - -try: - # Your code here - pass -except (ExecutionError, InvocationError) as e: - # Access termination reason from unrecoverable errors - print(f"Execution terminated: {e.termination_reason}") -``` - -### Validation errors - -**ValidationError** - Raised by the SDK when you pass invalid arguments to SDK operations. - -```python -from aws_durable_execution_sdk_python import ( - DurableContext, - durable_execution, - ValidationError, -) -from aws_durable_execution_sdk_python.config import CallbackConfig - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Handle SDK validation errors.""" - try: - # SDK raises ValidationError if timeout is invalid - callback = context.create_callback( - config=CallbackConfig(timeout_seconds=-1), # Invalid! - name="approval" - ) - return {"callback_id": callback} - except ValidationError as e: - # SDK caught invalid configuration - return {"error": "InvalidConfiguration", "message": str(e)} -``` - -The SDK raises `ValidationError` when: -- Operation arguments are invalid (negative timeouts, empty names) -- Required parameters are missing -- Configuration values are out of range - -### Execution errors - -**ExecutionError** - Raised when execution fails in a way that shouldn't be retried. Returns `FAILED` status without retry. - -```python -from aws_durable_execution_sdk_python import ExecutionError - -@durable_step -def process_data(step_context: StepContext, data: dict) -> dict: - """Process data with business logic validation.""" - if not data.get("required_field"): - raise ExecutionError("Required field missing") - return {"processed": True} -``` - -Use `ExecutionError` for: -- Business logic failures -- Invalid data that won't be fixed by retry -- Permanent failures that should fail fast - -### Invocation errors - -**InvocationError** - Raised when Lambda should retry the entire invocation. Causes Lambda to retry by throwing from the handler. - -```python -from aws_durable_execution_sdk_python import InvocationError - -@durable_step -def call_external_service(step_context: StepContext) -> dict: - """Call external service with retry.""" - try: - # Call external service - response = make_api_call() - return response - except ConnectionError: - # Trigger Lambda retry - raise InvocationError("Service unavailable") -``` - -Use `InvocationError` for: -- Service unavailability -- Network failures -- Transient infrastructure issues - -### Callback errors - -**CallbackError** - Raised when callback handling fails. - -```python -from aws_durable_execution_sdk_python import CallbackError -from aws_durable_execution_sdk_python.config import CallbackConfig - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Handle callback with error handling.""" - try: - callback = context.create_callback( - config=CallbackConfig(timeout_seconds=3600), - name="approval" - ) - context.wait_for_callback(callback) - return {"status": "approved"} - except CallbackError as e: - return {"error": "CallbackError", "callback_id": e.callback_id} -``` - -### Step interrupted errors - -**StepInterruptedError** - Raised when a step is interrupted before checkpointing. - -```python -from aws_durable_execution_sdk_python import StepInterruptedError - -# This can happen if Lambda times out during step execution -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - try: - result = context.step(long_running_operation()) - return result - except StepInterruptedError as e: - # Step was interrupted, will retry on next invocation - context.logger.warning(f"Step interrupted: {e.step_id}") - raise -``` - -### Serialization errors - -**SerDesError** - Raised when serialization or deserialization fails. - -```python -from aws_durable_execution_sdk_python import SerDesError - -@durable_step -def process_complex_data(step_context: StepContext, data: object) -> dict: - """Process data that might not be serializable.""" - try: - # Process data - return {"result": data} - except SerDesError as e: - # Handle serialization failure - return {"error": "Cannot serialize result"} -``` - -[↑ Back to top](#table-of-contents) - -## Retry strategies - -Configure retry behavior for steps using retry strategies. - -### Creating retry strategies - -Use `RetryStrategyConfig` to define retry behavior: - -```python -from aws_durable_execution_sdk_python import ( - DurableContext, - durable_execution, - durable_step, - StepContext, -) -from aws_durable_execution_sdk_python.config import StepConfig -from aws_durable_execution_sdk_python.retries import ( - RetryStrategyConfig, - create_retry_strategy, -) - -@durable_step -def unreliable_operation(step_context: StepContext) -> str: - """Operation that might fail.""" - # Your code here - return "success" - -@durable_execution -def handler(event: dict, context: DurableContext) -> str: - # Configure retry strategy - retry_config = RetryStrategyConfig( - max_attempts=3, - initial_delay_seconds=1, - max_delay_seconds=10, - backoff_rate=2.0, - retryable_error_types=[RuntimeError, ConnectionError], - ) - - # Create step config with retry - step_config = StepConfig( - retry_strategy=create_retry_strategy(retry_config) - ) - - # Execute with retry - result = context.step(unreliable_operation(), config=step_config) - return result -``` - -### RetryStrategyConfig parameters - -**max_attempts** - Maximum number of attempts (including the initial attempt). Default: 3. - -**initial_delay_seconds** - Initial delay before first retry in seconds. Default: 5. - -**max_delay_seconds** - Maximum delay between retries in seconds. Default: 300 (5 minutes). - -**backoff_rate** - Multiplier for exponential backoff. Default: 2.0. - -**jitter_strategy** - Jitter strategy to add randomness to delays. Default: `JitterStrategy.FULL`. - -**retryable_errors** - List of error message patterns to retry (strings or regex patterns). Default: matches all errors. - -**retryable_error_types** - List of exception types to retry. Default: empty (retry all). - -### Retry presets - -The SDK provides preset retry strategies for common scenarios: - -```python -from aws_durable_execution_sdk_python.retries import RetryPresets -from aws_durable_execution_sdk_python.config import StepConfig - -# No retries -step_config = StepConfig(retry_strategy=RetryPresets.none()) - -# Default retries (6 attempts, 5s initial delay) -step_config = StepConfig(retry_strategy=RetryPresets.default()) - -# Quick retries for transient errors (3 attempts) -step_config = StepConfig(retry_strategy=RetryPresets.transient()) - -# Longer retries for resource availability (5 attempts, up to 5 minutes) -step_config = StepConfig(retry_strategy=RetryPresets.resource_availability()) - -# Aggressive retries for critical operations (10 attempts) -step_config = StepConfig(retry_strategy=RetryPresets.critical()) -``` - -### Retrying specific exceptions - -Only retry certain exception types: - -```python -from random import random -from aws_durable_execution_sdk_python import ( - DurableContext, - durable_execution, - durable_step, - StepContext, -) -from aws_durable_execution_sdk_python.config import StepConfig -from aws_durable_execution_sdk_python.retries import ( - RetryStrategyConfig, - create_retry_strategy, -) - -@durable_step -def call_api(step_context: StepContext) -> dict: - """Call external API that might fail.""" - if random() > 0.5: - raise ConnectionError("Network timeout") - return {"status": "success"} - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - # Only retry ConnectionError, not other exceptions - retry_config = RetryStrategyConfig( - max_attempts=3, - retryable_error_types=[ConnectionError], - ) - - result = context.step( - call_api(), - config=StepConfig(create_retry_strategy(retry_config)), - ) - - return result -``` - -### Exponential backoff - -Configure exponential backoff to avoid overwhelming failing services: - -```python -retry_config = RetryStrategyConfig( - max_attempts=5, - initial_delay_seconds=1, # First retry after 1 second - max_delay_seconds=60, # Cap at 60 seconds - backoff_rate=2.0, # Double delay each time: 1s, 2s, 4s, 8s, 16s... -) -``` - -With this configuration: -- Attempt 1: Immediate -- Attempt 2: After 1 second -- Attempt 3: After 2 seconds -- Attempt 4: After 4 seconds -- Attempt 5: After 8 seconds - -[↑ Back to top](#table-of-contents) - -## Error response formats - -The SDK follows AWS service conventions for error responses. - -### Error response structure - -When a durable function fails, the response includes: - -```json -{ - "errorType": "ExecutionError", - "errorMessage": "Order validation failed", - "termination_reason": "EXECUTION_ERROR", - "stackTrace": [ - " File \"/var/task/handler.py\", line 42, in process_order", - " raise ExecutionError(\"Order validation failed\")" - ] -} -``` - -### Termination reasons - -**UNHANDLED_ERROR** - An unhandled exception occurred in user code. - -**INVOCATION_ERROR** - Lambda should retry the invocation. - -**EXECUTION_ERROR** - Execution failed and shouldn't be retried. - -**CHECKPOINT_FAILED** - Failed to checkpoint execution state. - -**NON_DETERMINISTIC_EXECUTION** - Execution produced different results on replay. - -**STEP_INTERRUPTED** - A step was interrupted before completing. - -**CALLBACK_ERROR** - Callback handling failed. - -**SERIALIZATION_ERROR** - Failed to serialize or deserialize data. - -### HTTP status codes - -When calling durable functions via API Gateway or Lambda URLs: - -- **200 OK** - Execution succeeded -- **400 Bad Request** - Validation error or invalid input -- **500 Internal Server Error** - Execution error or unhandled exception -- **503 Service Unavailable** - Invocation error (Lambda will retry) - -[↑ Back to top](#table-of-contents) - -## Common error scenarios - -### Handling input validation - -Validate input early and return clear error messages: - -```python -from aws_durable_execution_sdk_python import ( - DurableContext, - durable_execution, -) - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Validate input and handle errors.""" - # Validate required fields - if not event.get("user_id"): - return {"error": "InvalidInput", "message": "user_id is required"} - - if not event.get("action"): - return {"error": "InvalidInput", "message": "action is required"} - - # Process valid input - user_id = event["user_id"] - action = event["action"] - - result = context.step( - lambda _: {"user_id": user_id, "action": action, "status": "completed"}, - name="process_action" - ) - - return result -``` - -### Handling transient failures - -Retry transient failures automatically: - -```python -from aws_durable_execution_sdk_python import ( - DurableContext, - durable_execution, - durable_step, - StepContext, -) -from aws_durable_execution_sdk_python.config import StepConfig -from aws_durable_execution_sdk_python.retries import RetryPresets - -@durable_step -def call_external_api(step_context: StepContext, endpoint: str) -> dict: - """Call external API with retry.""" - # API call that might fail transiently - response = make_http_request(endpoint) - return response - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Handle API calls with automatic retry.""" - # Use transient preset for quick retries - step_config = StepConfig(retry_strategy=RetryPresets.transient()) - - try: - result = context.step( - call_external_api(event["endpoint"]), - config=step_config, - ) - return {"status": "success", "data": result} - except Exception as e: - # All retries exhausted - return {"status": "failed", "error": str(e)} -``` - -### Handling permanent failures - -Fail fast for permanent errors: - -```python -from aws_durable_execution_sdk_python import ( - DurableContext, - durable_execution, - durable_step, - ExecutionError, - StepContext, -) - -@durable_step -def process_payment(step_context: StepContext, amount: float, card: str) -> dict: - """Process payment with validation.""" - # Validate card - if not is_valid_card(card): - # Don't retry invalid cards - raise ExecutionError("Invalid card number") - - # Process payment - return {"transaction_id": "txn_123", "amount": amount} - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Handle payment with error handling.""" - try: - result = context.step( - process_payment(event["amount"], event["card"]) - ) - return {"status": "success", "transaction": result} - except ExecutionError as e: - # Permanent failure, don't retry - return {"status": "failed", "error": str(e)} -``` - -### Handling multiple error types - -Handle different error types appropriately: - -```python -from aws_durable_execution_sdk_python import ( - DurableContext, - durable_execution, - durable_step, - ExecutionError, - InvocationError, - ValidationError, - StepContext, -) - -@durable_step -def complex_operation(step_context: StepContext, data: dict) -> dict: - """Operation with multiple failure modes.""" - # Validate input - if not data: - raise ValueError("Data is required") - - # Check business rules - if data.get("amount", 0) < 0: - raise ExecutionError("Amount must be positive") - - # Call external service - try: - result = call_external_service(data) - return result - except ConnectionError: - # Transient failure - raise InvocationError("Service unavailable") - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Handle multiple error types.""" - try: - result = context.step(complex_operation(event)) - return {"status": "success", "result": result} - except ValueError as e: - return {"status": "invalid", "error": str(e)} - except ExecutionError as e: - return {"status": "failed", "error": str(e)} - except InvocationError as e: - # Let Lambda retry - raise -``` - -[↑ Back to top](#table-of-contents) - -## Troubleshooting - -### Step retries exhausted - -**Problem:** Your step fails after exhausting all retry attempts. - -**Cause:** The operation continues to fail, or the error isn't retryable. - -**Solution:** Check your retry configuration and error types: - -```python -# Ensure you're retrying the right errors -retry_config = RetryStrategyConfig( - max_attempts=5, # Increase attempts - retryable_error_types=[ConnectionError, TimeoutError], # Add error types -) -``` - -### Checkpoint failed errors - -**Problem:** Execution fails with `CheckpointError`. - -**Cause:** Failed to save execution state, possibly due to payload size limits or service issues. - -**Solution:** Reduce checkpoint payload size or check service health: - -```python -# Reduce payload size by returning only necessary data -@durable_step -def large_operation(step_context: StepContext) -> dict: - # Process large data - large_result = process_data() - - # Return only summary, not full data - return {"summary": large_result["summary"], "count": len(large_result["items"])} -``` - -### Callback timeout - -**Problem:** Callback times out before receiving a response. - -**Cause:** External system didn't respond within the timeout period. - -**Solution:** Increase callback timeout or implement retry logic: - -```python -from aws_durable_execution_sdk_python.config import CallbackConfig - -# Increase timeout -callback = context.create_callback( - config=CallbackConfig( - timeout_seconds=7200, # 2 hours - heartbeat_timeout_seconds=300, # 5 minutes - ), - name="long_running_approval" -) -``` - -### Step interrupted errors - -**Problem:** Steps are interrupted before completing. - -**Cause:** Lambda timeout or memory limit reached during step execution. - -**Solution:** Increase Lambda timeout or break large steps into smaller ones: - -```python -# Break large operation into smaller steps -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - # Process in chunks instead of all at once - items = event["items"] - chunk_size = 100 - - results = [] - for i in range(0, len(items), chunk_size): - chunk = items[i:i + chunk_size] - result = context.step( - lambda _, c=chunk: process_chunk(c), - name=f"process_chunk_{i}" - ) - results.extend(result) - - return {"processed": len(results)} -``` - -[↑ Back to top](#table-of-contents) - -## Best practices - -**Validate input early** - Check for invalid input at the start of your function and return clear error responses or raise appropriate exceptions like `ValueError`. - -**Use appropriate exception types** - Choose the right exception type for each failure scenario. Use `ExecutionError` for permanent failures and `InvocationError` for transient issues. - -**Configure retry for transient failures** - Use retry strategies for operations that might fail temporarily, such as network calls or rate limits. - -**Fail fast for permanent errors** - Don't retry errors that won't be fixed by retrying, such as validation failures or business logic errors. - -**Wrap non-deterministic code in steps** - All code that produces different results on replay must be wrapped in steps, including random values, timestamps, and external API calls. - -**Handle errors explicitly** - Catch and handle exceptions in your code. Provide meaningful error messages to callers. - -**Log errors with context** - Use `context.logger` to log errors with execution context for debugging. - -**Keep error messages clear** - Write error messages that help users understand what went wrong and how to fix it. - -**Test error scenarios** - Write tests for both success and failure cases to ensure your error handling works correctly. - -**Monitor error rates** - Track error rates and termination reasons to identify issues in production. - -[↑ Back to top](#table-of-contents) - -## FAQ - -**Q: What's the difference between ExecutionError and InvocationError?** - -A: `ExecutionError` fails the execution without retry (returns FAILED status). `InvocationError` triggers Lambda to retry the entire invocation. Use `ExecutionError` for permanent failures and `InvocationError` for transient issues. - -**Q: How do I retry only specific exceptions?** - -A: Use `retryable_error_types` in `RetryStrategyConfig`: - -```python -retry_config = RetryStrategyConfig( - max_attempts=3, - retryable_error_types=[ConnectionError, TimeoutError], -) -``` - -**Q: Can I customize the backoff strategy?** - -A: Yes, configure `initial_delay_seconds`, `max_delay_seconds`, `backoff_rate`, and `jitter_strategy` in `RetryStrategyConfig`. - -**Q: What happens when retries are exhausted?** - -A: The step checkpoints the error and the exception propagates to your handler. You can catch and handle it there. - -**Q: How do I prevent duplicate operations on retry?** - -A: Use at-most-once semantics for operations with side effects: - -```python -from aws_durable_execution_sdk_python.config import StepConfig, StepSemantics - -step_config = StepConfig( - step_semantics=StepSemantics.AT_MOST_ONCE_PER_RETRY -) -``` - -**Q: Can I access error details in my code?** - -A: Yes, catch the exception and access its attributes: - -```python -try: - result = context.step(operation()) -except CallbackError as e: - print(f"Callback failed: {e.callback_id}") -except NonDeterministicExecutionError as e: - print(f"Non-deterministic step: {e.step_id}") -``` - -**Q: How do I handle errors in parallel operations?** - -A: Wrap each parallel operation in a try-except block or let errors propagate to fail the entire execution: - -```python -results = [] -for item in items: - try: - result = context.step(lambda _, i=item: process(i), name=f"process_{item}") - results.append(result) - except Exception as e: - results.append({"error": str(e)}) -``` - -**Q: What's the maximum number of retry attempts?** - -A: You can configure any number of attempts, but consider Lambda timeout limits. The default is 6 attempts. - -[↑ Back to top](#table-of-contents) - -## Testing - -You can test error handling using the testing SDK. The test runner executes your function and lets you inspect errors. - -### Testing successful execution - -```python -import pytest -from aws_durable_execution_sdk_python_testing import InvocationStatus -from my_function import handler - -@pytest.mark.durable_execution( - handler=handler, - lambda_function_name="my_function", -) -def test_success(durable_runner): - """Test successful execution.""" - with durable_runner: - result = durable_runner.run(input={"data": "test"}, timeout=10) - - assert result.status is InvocationStatus.SUCCEEDED -``` - -### Testing error conditions - -Test that your function handles errors correctly: - -```python -@pytest.mark.durable_execution( - handler=handler_with_validation, - lambda_function_name="validation_function", -) -def test_input_validation(durable_runner): - """Test input validation handling.""" - with durable_runner: - result = durable_runner.run(input={}, timeout=10) - - # Function should return error response for invalid input - assert result.status is InvocationStatus.SUCCEEDED - assert "error" in result.result - assert result.result["error"] == "InvalidInput" -``` - -### Testing SDK validation errors - -Test that the SDK catches invalid configuration: - -```python -@pytest.mark.durable_execution( - handler=handler_with_invalid_config, - lambda_function_name="sdk_validation_function", -) -def test_sdk_validation_error(durable_runner): - """Test SDK validation error handling.""" - with durable_runner: - result = durable_runner.run(input={}, timeout=10) - - # SDK should catch invalid configuration - assert result.status is InvocationStatus.FAILED - assert "ValidationError" in str(result.error) -``` - -### Testing retry behavior - -Test that steps retry correctly: - -```python -@pytest.mark.durable_execution( - handler=handler_with_retry, - lambda_function_name="retry_function", -) -def test_retry_success(durable_runner): - """Test that retries eventually succeed.""" - with durable_runner: - result = durable_runner.run(input={}, timeout=30) - - # Should succeed after retries - assert result.status is InvocationStatus.SUCCEEDED -``` - -### Testing retry exhaustion - -Test that execution fails when retries are exhausted: - -```python -@pytest.mark.durable_execution( - handler=handler_always_fails, - lambda_function_name="failing_function", -) -def test_retry_exhausted(durable_runner): - """Test that execution fails after exhausting retries.""" - with durable_runner: - result = durable_runner.run(input={}, timeout=30) - - # Should fail after all retries - assert result.status is InvocationStatus.FAILED - assert "RuntimeError" in str(result.error) -``` - -### Inspecting error details - -Inspect error details in test results: - -```python -@pytest.mark.durable_execution( - handler=handler_with_error, - lambda_function_name="error_function", -) -def test_error_details(durable_runner): - """Test error details are captured.""" - with durable_runner: - result = durable_runner.run(input={}, timeout=10) - - # Check error details - assert result.status is InvocationStatus.FAILED - assert result.error is not None - assert "error_type" in result.error - assert "message" in result.error -``` - -For more testing patterns, see: -- [Basic tests](../testing-patterns/basic-tests.md) - Simple test examples -- [Complex workflows](../testing-patterns/complex-workflows.md) - Multi-step workflow testing -- [Best practices](../testing-patterns/best-practices.md) - Testing recommendations - -[↑ Back to top](#table-of-contents) - -## See also - -- [Steps](../core/steps.md) - Configure retry for steps -- [Callbacks](../core/callbacks.md) - Handle callback errors -- [Child contexts](../core/child-contexts.md) - Error handling in nested contexts -- [Retry strategies](../api-reference/config.md) - Retry configuration reference -- [Examples](https://github.com/awslabs/aws-durable-execution-sdk-python/tree/main/examples/src/step) - Error handling examples - -[↑ Back to top](#table-of-contents) - -## License - -See the [LICENSE](../../LICENSE) file for our project's licensing. - -[↑ Back to top](#table-of-contents) diff --git a/docs/advanced/serialization.md b/docs/advanced/serialization.md deleted file mode 100644 index 112131aa..00000000 --- a/docs/advanced/serialization.md +++ /dev/null @@ -1,771 +0,0 @@ -# Serialization - -Learn how the SDK serializes and deserializes data for durable execution checkpoints. - -## Table of Contents - -- [Terminology](#terminology) -- [What is serialization?](#what-is-serialization) -- [Key features](#key-features) -- [Default serialization behavior](#default-serialization-behavior) -- [Supported types](#supported-types) -- [Converting non-serializable types](#converting-non-serializable-types) -- [Custom serialization](#custom-serialization) -- [Serialization in configurations](#serialization-in-configurations) -- [Best practices](#best-practices) -- [Troubleshooting](#troubleshooting) -- [FAQ](#faq) - -[← Back to main index](../index.md) - -## Terminology - -**Serialization** - Converting Python objects to strings for storage in checkpoints. - -**Deserialization** - Converting checkpoint strings back to Python objects. - -**SerDes** - Short for Serializer/Deserializer, a custom class that handles both serialization and deserialization. - -**Checkpoint** - A saved state of execution that includes serialized operation results. - -**Extended types** - Types beyond basic JSON (datetime, Decimal, UUID, bytes) that the SDK serializes automatically. - -**Envelope format** - The SDK's internal format that wraps complex types with type tags for accurate deserialization. - -[↑ Back to top](#table-of-contents) - -## What is serialization? - -Serialization converts Python objects into strings that can be stored in checkpoints. When your durable function resumes, deserialization converts those strings back into Python objects. The SDK handles this automatically for most types. - -[↑ Back to top](#table-of-contents) - -## Key features - -- Automatic serialization for common Python types -- Extended type support (datetime, Decimal, UUID, bytes) -- Custom serialization for complex objects -- Type preservation during round-trip serialization -- Efficient plain JSON for primitives - -[↑ Back to top](#table-of-contents) - -## Default serialization behavior - -The SDK handles most Python types automatically: - -```python -from aws_durable_execution_sdk_python import DurableContext, durable_execution -from datetime import datetime -from decimal import Decimal -from uuid import uuid4 - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - # All these types serialize automatically - result = context.step( - process_order, - order_id=uuid4(), - amount=Decimal("99.99"), - timestamp=datetime.now() - ) - return result -``` - -The SDK serializes data automatically when: -- Checkpointing step results -- Storing callback payloads -- Passing data to child contexts -- Returning results from your handler - -[↑ Back to top](#table-of-contents) - -## Supported types - -### Primitive types - -These types serialize as plain JSON for performance: - -```python -from aws_durable_execution_sdk_python import DurableContext, durable_execution - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - # Primitives - plain JSON - none_value = None - text = "hello" - number = 42 - decimal_num = 3.14 - flag = True - - # Simple lists of primitives - plain JSON - numbers = [1, 2, 3, 4, 5] - - return { - "none": none_value, - "text": text, - "number": number, - "decimal": decimal_num, - "flag": flag, - "numbers": numbers - } -``` - -**Supported primitive types:** -- `None` -- `str` -- `int` -- `float` -- `bool` -- Lists containing only primitives - -[↑ Back to top](#table-of-contents) - -### Extended types - -The SDK automatically handles these types using envelope format: - -```python -from aws_durable_execution_sdk_python import DurableContext, durable_execution -from datetime import datetime, date -from decimal import Decimal -from uuid import UUID, uuid4 - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - # Extended types - automatic serialization - order_data = { - "order_id": uuid4(), # UUID - "amount": Decimal("99.99"), # Decimal - "created_at": datetime.now(), # datetime - "delivery_date": date.today(), # date - "signature": b"binary_signature_data", # bytes - "coordinates": (40.7128, -74.0060), # tuple - } - - result = context.step(process_order, order_data) - return result -``` - -**Supported extended types:** -- `datetime` - ISO format with timezone -- `date` - ISO date format -- `Decimal` - Precise decimal numbers -- `UUID` - Universally unique identifiers -- `bytes`, `bytearray`, `memoryview` - Binary data (base64 encoded) -- `tuple` - Immutable sequences -- `list` - Mutable sequences (including nested) -- `dict` - Dictionaries (including nested) - -[↑ Back to top](#table-of-contents) - -### Container types - -Containers can hold any supported type, including nested containers: - -```python -from aws_durable_execution_sdk_python import DurableContext, durable_execution -from datetime import datetime -from decimal import Decimal -from uuid import uuid4 - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - # Nested structures serialize automatically - complex_data = { - "user": { - "id": uuid4(), - "created": datetime.now(), - "balance": Decimal("1234.56"), - "metadata": b"binary_data", - "coordinates": (40.7128, -74.0060), - "tags": ["premium", "verified"], - "settings": { - "notifications": True, - "theme": "dark", - "limits": { - "daily": Decimal("500.00"), - "monthly": Decimal("10000.00"), - }, - }, - } - } - - result = context.step(process_user, complex_data) - return result -``` - -[↑ Back to top](#table-of-contents) - -## Converting non-serializable types - -Some Python types aren't serializable by default. Convert them before passing to durable operations. - -### Dataclasses - -Convert dataclasses to dictionaries: - -```python -from dataclasses import dataclass, asdict -from aws_durable_execution_sdk_python import DurableContext, durable_execution - -@dataclass -class Order: - order_id: str - amount: float - customer: str - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - order = Order( - order_id="ORD-123", - amount=99.99, - customer="Jane Doe" - ) - - # Convert to dict before passing to step - result = context.step(process_order, asdict(order)) - return result -``` - -### Pydantic models - -Use Pydantic's built-in serialization: - -```python -from pydantic import BaseModel -from aws_durable_execution_sdk_python import DurableContext, durable_execution - -class Order(BaseModel): - order_id: str - amount: float - customer: str - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - order = Order( - order_id="ORD-123", - amount=99.99, - customer="Jane Doe" - ) - - # Use model_dump() to convert to dict - result = context.step(process_order, order.model_dump()) - return result -``` - -### Custom objects - -Implement `to_dict()` and `from_dict()` methods: - -```python -from aws_durable_execution_sdk_python import DurableContext, durable_execution - -class Order: - def __init__(self, order_id: str, amount: float, customer: str): - self.order_id = order_id - self.amount = amount - self.customer = customer - - def to_dict(self) -> dict: - return { - "order_id": self.order_id, - "amount": self.amount, - "customer": self.customer - } - - @classmethod - def from_dict(cls, data: dict) -> "Order": - return cls( - order_id=data["order_id"], - amount=data["amount"], - customer=data["customer"] - ) - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - order = Order("ORD-123", 99.99, "Jane Doe") - - # Convert to dict before passing to step - result = context.step(process_order, order.to_dict()) - return result -``` - -[↑ Back to top](#table-of-contents) - -## Custom serialization - -Implement custom serialization for specialized needs like encryption or compression. - -### Creating a custom SerDes - -Extend the `SerDes` base class: - -```python -from aws_durable_execution_sdk_python.serdes import SerDes, SerDesContext -import json - -class UpperCaseSerDes(SerDes[str]): - """Example: Convert strings to uppercase during serialization.""" - - def serialize(self, value: str, serdes_context: SerDesContext) -> str: - return value.upper() - - def deserialize(self, data: str, serdes_context: SerDesContext) -> str: - return data.lower() -``` - -### Using custom SerDes with steps - -Pass your custom SerDes in `StepConfig`: - -```python -from aws_durable_execution_sdk_python import DurableContext, durable_execution, durable_step, StepContext -from aws_durable_execution_sdk_python.config import StepConfig -from aws_durable_execution_sdk_python.serdes import SerDes, SerDesContext -import json - -class CompressedSerDes(SerDes[dict]): - """Example: Compress large dictionaries.""" - - def serialize(self, value: dict, serdes_context: SerDesContext) -> str: - # In production, use actual compression like gzip - return json.dumps(value, separators=(',', ':')) - - def deserialize(self, data: str, serdes_context: SerDesContext) -> dict: - return json.loads(data) - -@durable_step -def process_large_data(step_context: StepContext, data: dict) -> dict: - # Process the data - return {"processed": True, "items": len(data)} - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - large_data = {"items": [f"item_{i}" for i in range(1000)]} - - # Use custom SerDes for this step - config = StepConfig(serdes=CompressedSerDes()) - result = context.step(process_large_data(large_data), config=config) - - return result -``` - -### Encryption example - -Encrypt sensitive data in checkpoints: - -```python -from aws_durable_execution_sdk_python import DurableContext, durable_execution, durable_step, StepContext -from aws_durable_execution_sdk_python.config import StepConfig -from aws_durable_execution_sdk_python.serdes import SerDes, SerDesContext -import json -import base64 - -class EncryptedSerDes(SerDes[dict]): - """Example: Encrypt sensitive data (simplified for demonstration).""" - - def __init__(self, encryption_key: str): - self.encryption_key = encryption_key - - def serialize(self, value: dict, serdes_context: SerDesContext) -> str: - json_str = json.dumps(value) - # In production, use proper encryption like AWS KMS - encrypted = base64.b64encode(json_str.encode()).decode() - return encrypted - - def deserialize(self, data: str, serdes_context: SerDesContext) -> dict: - # In production, use proper decryption - decrypted = base64.b64decode(data.encode()).decode() - return json.loads(decrypted) - -@durable_step -def process_sensitive_data(step_context: StepContext, data: dict) -> dict: - return {"processed": True} - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - sensitive_data = { - "ssn": "123-45-6789", - "credit_card": "4111-1111-1111-1111" - } - - # Encrypt data in checkpoints - config = StepConfig(serdes=EncryptedSerDes("my-key")) - result = context.step(process_sensitive_data(sensitive_data), config=config) - - return result -``` - -[↑ Back to top](#table-of-contents) - -## Serialization in configurations - -Different operations support custom serialization through their configuration objects. - -### StepConfig - -Control serialization for step results: - -```python -from aws_durable_execution_sdk_python import DurableContext, durable_execution -from aws_durable_execution_sdk_python.config import StepConfig - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - config = StepConfig(serdes=CustomSerDes()) - result = context.step(my_function(), config=config) - return result -``` - -### CallbackConfig - -Control serialization for callback payloads: - -```python -from aws_durable_execution_sdk_python import DurableContext, durable_execution -from aws_durable_execution_sdk_python.config import CallbackConfig, Duration - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - config = CallbackConfig( - timeout=Duration.from_hours(2), - serdes=CustomSerDes() - ) - callback = context.create_callback(config=config) - - # Send callback.callback_id to external system - return {"callback_id": callback.callback_id} -``` - -### MapConfig and ParallelConfig - -Control serialization for batch results: - -```python -from aws_durable_execution_sdk_python import DurableContext, durable_execution -from aws_durable_execution_sdk_python.config import MapConfig - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - items = [1, 2, 3, 4, 5] - - # Custom serialization for BatchResult - config = MapConfig( - serdes=CustomSerDes(), # For the entire BatchResult - item_serdes=ItemSerDes() # For individual item results - ) - - result = context.map(process_item, items, config=config) - return {"processed": len(result.succeeded)} -``` - -**Note:** When both `serdes` and `item_serdes` are provided: -- `item_serdes` serializes individual item results in child contexts -- `serdes` serializes the entire `BatchResult` at the handler level - -For backward compatibility, if only `serdes` is provided, it's used for both individual items and the `BatchResult`. - -[↑ Back to top](#table-of-contents) - -## Best practices - -### Use default serialization when possible - -The SDK handles most cases efficiently without custom serialization: - -```python -# Good - uses default serialization -from datetime import datetime -from decimal import Decimal - -result = context.step( - process_order, - order_id="ORD-123", - amount=Decimal("99.99"), - timestamp=datetime.now() -) -``` - -### Convert complex objects to dicts - -Convert custom objects to dictionaries before passing to durable operations: - -```python -# Good - convert to dict first -order_dict = order.to_dict() -result = context.step(process_order, order_dict) - -# Avoid - custom objects aren't serializable -result = context.step(process_order, order) # Will fail -``` - -### Keep serialized data small - -Large checkpoints might slow down execution. Keep data compact: - -```python -# Good - only checkpoint what you need -result = context.step( - process_data, - {"id": order.id, "amount": order.amount} -) - -# Avoid - large objects in checkpoints -result = context.step( - process_data, - entire_database_dump # Too large -) -``` - -### Use appropriate types - -Choose types that serialize efficiently: - -```python -# Good - Decimal for precise amounts -amount = Decimal("99.99") - -# Avoid - float for money (precision issues) -amount = 99.99 -``` - -### Test serialization round-trips - -Verify your data survives serialization: - -```python -from aws_durable_execution_sdk_python.serdes import serialize, deserialize - -def test_serialization(): - original = {"amount": Decimal("99.99")} - serialized = serialize(None, original, "test-op", "test-arn") - deserialized = deserialize(None, serialized, "test-op", "test-arn") - - assert deserialized == original -``` - -### Handle serialization errors gracefully - -Catch and handle serialization errors: - -```python -from aws_durable_execution_sdk_python.exceptions import ExecutionError - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - try: - result = context.step(process_data, complex_object) - except ExecutionError as e: - if "Serialization failed" in str(e): - # Convert to serializable format - simple_data = convert_to_dict(complex_object) - result = context.step(process_data, simple_data) - else: - raise - - return result -``` - -[↑ Back to top](#table-of-contents) - -## Troubleshooting - -### Unsupported type error - -**Problem:** `SerDesError: Unsupported type: ` - -**Solution:** Convert custom objects to supported types: - -```python -# Before - fails -result = context.step(process_order, order_object) - -# After - works -result = context.step(process_order, order_object.to_dict()) -``` - -### Serialization failed error - -**Problem:** `ExecutionError: Serialization failed for id: step-123` - -**Cause:** The data contains types that can't be serialized. - -**Solution:** Check for circular references or unsupported types: - -```python -# Circular reference - fails -data = {"self": None} -data["self"] = data - -# Fix - remove circular reference -data = {"id": 123, "name": "test"} -``` - -### Type not preserved after deserialization - -**Problem:** `tuple` becomes `list` or `Decimal` becomes `float` - -**Cause:** Using a custom SerDes that doesn't preserve types. - -**Solution:** Use default serialization which preserves types: - -```python -# Default serialization preserves tuple -result = context.step(process_data, (1, 2, 3)) # Stays as tuple - -# If using custom SerDes, ensure it preserves types -class TypePreservingSerDes(SerDes[Any]): - def serialize(self, value: Any, context: SerDesContext) -> str: - # Implement type preservation logic - pass -``` - -### Large payload errors - -**Problem:** Checkpoint size exceeds limits - -**Solution:** Reduce data size or use summary generators: - -```python -# Option 1: Reduce data -small_data = {"id": order.id, "status": order.status} -result = context.step(process_order, small_data) - -# Option 2: Use summary generator (for map/parallel) -def generate_summary(result): - return json.dumps({"count": len(result.all)}) - -config = MapConfig(summary_generator=generate_summary) -result = context.map(process_item, items, config=config) -``` - -### Datetime timezone issues - -**Problem:** Datetime loses timezone information - -**Solution:** Always use timezone-aware datetime objects: - -```python -from datetime import datetime, UTC - -# Good - timezone aware -timestamp = datetime.now(UTC) - -# Avoid - naive datetime -timestamp = datetime.now() # No timezone -``` - -[↑ Back to top](#table-of-contents) - -## FAQ - -### What types can I serialize? - -The SDK supports: -- Primitives: `None`, `str`, `int`, `float`, `bool` -- Extended: `datetime`, `date`, `Decimal`, `UUID`, `bytes`, `tuple` -- Containers: `list`, `dict` (including nested) - -For other types, convert to dictionaries first. - -### Do I need custom serialization? - -Most applications don't need custom serialization. Use it for: -- Encryption of sensitive data -- Compression of large payloads -- Special encoding requirements -- Legacy format compatibility - -### How does serialization affect performance? - -The SDK optimizes for performance: -- Primitives use plain JSON (fast) -- Extended types use envelope format (slightly slower but preserves types) -- Custom SerDes adds overhead based on your implementation - -### Can I serialize Pydantic models? - -Yes, convert them to dictionaries: - -```python -order = Order(order_id="ORD-123", amount=99.99) -result = context.step(process_order, order.model_dump()) -``` - -### What's the difference between serdes and item_serdes? - -In `MapConfig` and `ParallelConfig`: -- `item_serdes`: Serializes individual item results in child contexts -- `serdes`: Serializes the entire `BatchResult` at handler level - -If only `serdes` is provided, it's used for both (backward compatibility). - -### How do I handle binary data? - -Use `bytes` type - it's automatically base64 encoded: - -```python -binary_data = b"binary content" -result = context.step(process_binary, binary_data) -``` - -### Can I use JSON strings directly? - -Yes, use `PassThroughSerDes` or `JsonSerDes`: - -```python -from aws_durable_execution_sdk_python.serdes import JsonSerDes -from aws_durable_execution_sdk_python.config import StepConfig - -config = StepConfig(serdes=JsonSerDes()) -result = context.step(process_json, json_string, config=config) -``` - -### What happens if serialization fails? - -The SDK raises `ExecutionError` with details. Handle it in your code: - -```python -from aws_durable_execution_sdk_python.exceptions import ExecutionError - -try: - result = context.step(process_data, data) -except ExecutionError as e: - context.logger.error(f"Serialization failed: {e}") - # Handle error or convert data -``` - -### How do I debug serialization issues? - -Test serialization independently: - -```python -from aws_durable_execution_sdk_python.serdes import serialize, deserialize - -try: - serialized = serialize(None, my_data, "test-op", "test-arn") - deserialized = deserialize(None, serialized, "test-op", "test-arn") - print("Serialization successful") -except Exception as e: - print(f"Serialization failed: {e}") -``` - -### Are there size limits for serialized data? - -Yes, checkpoints have size limits (typically 256KB). Keep data compact: -- Only checkpoint necessary data -- Use summary generators for large results -- Store large data externally (S3) and checkpoint references - -[↑ Back to top](#table-of-contents) - -## See also - -- [Steps](../core/steps.md) - Using steps with custom serialization -- [Callbacks](../core/callbacks.md) - Serializing callback payloads -- [Map Operations](../core/map.md) - Serialization in map operations -- [Error Handling](error-handling.md) - Handling serialization errors -- [Best Practices](../best-practices.md) - General best practices - -[↑ Back to index](#table-of-contents) diff --git a/docs/advanced/testing-modes.md b/docs/advanced/testing-modes.md deleted file mode 100644 index 4a749201..00000000 --- a/docs/advanced/testing-modes.md +++ /dev/null @@ -1,495 +0,0 @@ -# Testing Modes: Local vs Cloud - -## Table of Contents - -- [Overview](#overview) -- [Terminology](#terminology) -- [Key features](#key-features) -- [Getting started](#getting-started) -- [Configuration](#configuration) -- [Deployment workflow](#deployment-workflow) -- [Running tests in different modes](#running-tests-in-different-modes) -- [Local vs cloud modes](#local-vs-cloud-modes) -- [Best practices](#best-practices) -- [Troubleshooting](#troubleshooting) -- [See also](#see-also) - -[← Back to main index](../index.md) - -## Overview - -The [AWS Durable Execution SDK Testing Framework](https://github.com/aws/aws-durable-execution-sdk-python-testing) (`aws-durable-execution-sdk-python-testing`) supports two execution modes: local and cloud. Local mode runs tests in-memory for fast development, while cloud mode runs tests against actual AWS Lambda functions for integration validation. - -**Local mode** uses `DurableFunctionTestRunner` to execute your function in-memory without AWS deployment. It's fast, requires no credentials, and perfect for development. - -**Cloud mode** uses `DurableFunctionCloudTestRunner` to invoke deployed Lambda functions and poll for completion. It validates your function's behavior in a real AWS environment, including Lambda runtime behavior, IAM permissions, and service integrations. - -[↑ Back to top](#table-of-contents) - -## Terminology - -**Cloud mode** - Test execution mode that runs tests against deployed Lambda functions in AWS. - -**Local mode** - Test execution mode that runs tests in-memory without AWS deployment (default). - -**DurableFunctionCloudTestRunner** - Test runner class that executes durable functions against AWS Lambda backend. - -**Qualified function name** - Lambda function identifier including version or alias (e.g., `MyFunction:$LATEST`). - -**Polling** - The process of repeatedly checking execution status until completion. - -[↑ Back to top](#table-of-contents) - -## Key features - -- **Real AWS environment** - Tests run against actual Lambda functions -- **End-to-end validation** - Verifies deployment, IAM permissions, and service integrations -- **Same test interface** - Tests work in both local and cloud modes without changes -- **Automatic polling** - Waits for execution completion automatically -- **Execution history** - Retrieves full execution history for assertions - -[↑ Back to top](#table-of-contents) - -## Getting started - -Here's a simple example of running tests in cloud mode: - -```python -import pytest -from aws_durable_execution_sdk_python.execution import InvocationStatus -from examples.src import hello_world - - -@pytest.mark.durable_execution( - handler=hello_world.handler, - lambda_function_name="hello world", -) -def test_hello_world(durable_runner): - """Test hello world in both local and cloud modes.""" - with durable_runner: - result = durable_runner.run(input="test", timeout=10) - - assert result.status == InvocationStatus.SUCCEEDED - assert result.result == "Hello World!" -``` - -Run the test in cloud mode: - -```console -# Set environment variables -export AWS_REGION=us-west-2 -export QUALIFIED_FUNCTION_NAME="HelloWorld:$LATEST" -export LAMBDA_FUNCTION_TEST_NAME="hello world" - -# Run test -pytest --runner-mode=cloud -k test_hello_world -``` - -[↑ Back to top](#table-of-contents) - -## Configuration - -### Environment Variables - -Cloud mode requires these environment variables: - -**Required:** - -- `QUALIFIED_FUNCTION_NAME` - The deployed Lambda function ARN or qualified name - - Example: `MyFunction:$LATEST` - - Example: `arn:aws:lambda:us-west-2:123456789012:function:MyFunction:$LATEST` - -- `LAMBDA_FUNCTION_TEST_NAME` - The function name to match against test markers - - Example: `hello world` - - Must match the `lambda_function_name` parameter in `@pytest.mark.durable_execution` - -**Optional:** - -- `AWS_REGION` - AWS region for Lambda invocation (default: `us-west-2`) - - Example: `us-east-1` - -- `LAMBDA_ENDPOINT` - Custom Lambda endpoint URL for testing - - Example: `https://lambda.us-west-2.amazonaws.com` - - Useful for testing against local Lambda emulators - -### CLI Options - -- `--runner-mode` - Test execution mode - - `local` (default) - Run tests in-memory - - `cloud` - Run tests against deployed Lambda functions - -### Test Markers - -Use the `@pytest.mark.durable_execution` marker to configure tests: - -```python -@pytest.mark.durable_execution( - handler=my_function.handler, # Required for local mode - lambda_function_name="my function", # Required for cloud mode -) -def test_my_function(durable_runner): - # Test code here - pass -``` - -**Parameters:** - -- `handler` - The durable function handler (required for local mode) -- `lambda_function_name` - The function name for cloud mode matching (required for cloud mode) - -[↑ Back to top](#table-of-contents) - -## Deployment workflow - -Follow these steps to deploy and test your durable functions in the cloud: - -### 1. Deploy your function - -Deploy your Lambda function to AWS using your preferred deployment tool (SAM, CDK, Terraform, etc.): - -```console -# Example using SAM -sam build -sam deploy --stack-name my-durable-function -``` - -### 2. Get the function ARN - -After deployment, get the qualified function name or ARN: - -```console -# Get function ARN -aws lambda get-function --function-name MyFunction --query 'Configuration.FunctionArn' -``` - -### 3. Set environment variables - -Configure the environment for cloud testing: - -```console -export AWS_REGION=us-west-2 -export QUALIFIED_FUNCTION_NAME="MyFunction:$LATEST" -export LAMBDA_FUNCTION_TEST_NAME="my function" -``` - -### 4. Run tests - -Execute your tests in cloud mode: - -```console -pytest --runner-mode=cloud -k test_my_function -``` - -[↑ Back to top](#table-of-contents) - -## Running tests in different modes - -### Run all tests in local mode (default) - -```console -pytest examples/test/ -``` - -### Run all tests in cloud mode - -```console -pytest --runner-mode=cloud examples/test/ -``` - -### Run specific test in cloud mode - -```console -pytest --runner-mode=cloud -k test_hello_world examples/test/ -``` - -### Run with custom timeout - -Increase the timeout for long-running functions: - -```python -def test_long_running(durable_runner): - with durable_runner: - result = durable_runner.run(input="test", timeout=300) # 5 minutes - - assert result.status == InvocationStatus.SUCCEEDED -``` - -### Mode-specific assertions - -Check the runner mode in your tests: - -```python -def test_with_mode_check(durable_runner): - with durable_runner: - result = durable_runner.run(input="test", timeout=10) - - assert result.status == InvocationStatus.SUCCEEDED - - # Cloud-specific validation - if durable_runner.mode == "cloud": - # Additional assertions for cloud environment - pass -``` - -[↑ Back to top](#table-of-contents) - -## Local vs cloud modes - -### Comparison - -| Feature | Local Mode | Cloud Mode | -|---------|-----------|------------| -| **Execution** | In-memory | AWS Lambda | -| **Speed** | Fast (seconds) | Slower (network latency) | -| **AWS credentials** | Not required | Required | -| **Deployment** | Not required | Required | -| **IAM permissions** | Not validated | Validated | -| **Service integrations** | Mocked | Real | -| **Cost** | Free | Lambda invocation costs | -| **Use case** | Development, unit tests | Integration tests, validation | - -### When to use local mode - -Use local mode for: -- **Development** - Fast iteration during development -- **Unit tests** - Testing function logic without AWS dependencies -- **CI/CD** - Fast feedback in pull request checks -- **Debugging** - Easy debugging with local tools - -### When to use cloud mode - -Use cloud mode for: -- **Integration testing** - Validate real AWS service integrations -- **Deployment validation** - Verify deployed functions work correctly -- **IAM testing** - Ensure permissions are configured correctly -- **End-to-end testing** - Test complete workflows in production-like environment - -### Writing mode-agnostic tests - -Write tests that work in both modes: - -```python -@pytest.mark.durable_execution( - handler=my_function.handler, - lambda_function_name="my function", -) -def test_my_function(durable_runner): - """Test works in both local and cloud modes.""" - with durable_runner: - result = durable_runner.run(input={"value": 42}, timeout=10) - - # These assertions work in both modes - assert result.status == InvocationStatus.SUCCEEDED - assert result.result == "expected output" -``` - -[↑ Back to top](#table-of-contents) - -## Best practices - -### Use local mode for development - -Run tests locally during development for fast feedback: - -```console -# Fast local testing -pytest examples/test/ -``` - -### Use cloud mode for validation - -Run cloud tests before merging or deploying: - -```console -# Validate deployment -pytest --runner-mode=cloud examples/test/ -``` - -### Set appropriate timeouts - -Cloud tests need longer timeouts due to network latency: - -```python -# Local mode: short timeout -result = runner.run(input="test", timeout=10) - -# Cloud mode: longer timeout -result = runner.run(input="test", timeout=60) -``` - -### Use environment-specific configuration - -Configure different settings for different environments: - -```console -# Development -export AWS_REGION=us-west-2 -export QUALIFIED_FUNCTION_NAME="MyFunction-Dev:$LATEST" - -# Production -export AWS_REGION=us-east-1 -export QUALIFIED_FUNCTION_NAME="MyFunction-Prod:$LATEST" -``` - -### Test one function at a time - -When running cloud tests, test one function at a time to avoid confusion: - -```console -# Test specific function -export LAMBDA_FUNCTION_TEST_NAME="hello world" -pytest --runner-mode=cloud -k test_hello_world -``` - -### Use CI/CD for automated cloud testing - -Integrate cloud testing into your CI/CD pipeline: - -```yaml -# Example GitHub Actions workflow -- name: Deploy function - run: sam deploy --stack-name test-stack - -- name: Run cloud tests - env: - AWS_REGION: us-west-2 - QUALIFIED_FUNCTION_NAME: ${{ steps.deploy.outputs.function_arn }} - LAMBDA_FUNCTION_TEST_NAME: "hello world" - run: pytest --runner-mode=cloud -k test_hello_world -``` - -[↑ Back to top](#table-of-contents) - -## Troubleshooting - -### TimeoutError: Execution did not complete - -**Problem:** Test times out waiting for execution to complete. - -**Cause:** The function takes longer than the timeout value, or the function is stuck. - -**Solution:** Increase the timeout parameter: - -```python -# Increase timeout to 120 seconds -result = runner.run(input="test", timeout=120) -``` - -Check the Lambda function logs to see if it's actually running: - -```console -aws logs tail /aws/lambda/MyFunction --follow -``` - -### Environment variables not set - -**Problem:** `Cloud mode requires both QUALIFIED_FUNCTION_NAME and LAMBDA_FUNCTION_TEST_NAME environment variables` - -**Cause:** Required environment variables are missing. - -**Solution:** Set both required environment variables: - -```console -export QUALIFIED_FUNCTION_NAME="MyFunction:$LATEST" -export LAMBDA_FUNCTION_TEST_NAME="hello world" -``` - -### Test skipped: doesn't match LAMBDA_FUNCTION_TEST_NAME - -**Problem:** Test is skipped with message about function name mismatch. - -**Cause:** The test's `lambda_function_name` doesn't match `LAMBDA_FUNCTION_TEST_NAME`. - -**Solution:** Either: -1. Update `LAMBDA_FUNCTION_TEST_NAME` to match the test: - ```console - export LAMBDA_FUNCTION_TEST_NAME="my function" - ``` - -2. Or run only the matching test: - ```console - pytest --runner-mode=cloud -k test_hello_world - ``` - -### Failed to invoke Lambda function - -**Problem:** `Failed to invoke Lambda function MyFunction: ...` - -**Cause:** AWS credentials are invalid, function doesn't exist, or IAM permissions are missing. - -**Solution:** - -1. Verify AWS credentials: - ```console - aws sts get-caller-identity - ``` - -2. Verify function exists: - ```console - aws lambda get-function --function-name MyFunction - ``` - -3. Check IAM permissions - you need `lambda:InvokeFunction` permission: - ```json - { - "Effect": "Allow", - "Action": "lambda:InvokeFunction", - "Resource": "arn:aws:lambda:*:*:function:*" - } - ``` - -### No DurableExecutionArn in response - -**Problem:** `No DurableExecutionArn in response for function MyFunction` - -**Cause:** The Lambda function is not a durable function or doesn't have durable execution enabled. - -**Solution:** Ensure your function is decorated with `@durable_execution`: - -```python -from aws_durable_execution_sdk_python import durable_execution, DurableContext - -@durable_execution -def handler(event: dict, context: DurableContext): - # Your durable function code - pass -``` - -### Lambda function failed - -**Problem:** `Lambda function failed: ...` - -**Cause:** The function threw an unhandled exception. - -**Solution:** Check the Lambda function logs: - -```console -aws logs tail /aws/lambda/MyFunction --follow -``` - -Fix the error in your function code and redeploy. - -### Failed to get execution status - -**Problem:** `Failed to get execution status: ...` - -**Cause:** The Lambda service API call failed. - -**Solution:** - -1. Check AWS service health -2. Verify your AWS credentials have the required permissions -3. Check if you're using the correct region: - ```console - export AWS_REGION=us-west-2 - ``` - -[↑ Back to top](#table-of-contents) - -## See also - -- [Getting Started](../getting-started.md) - Set up your development environment -- [Testing patterns](../testing-patterns/basic-tests.md) - Practical pytest examples -- [Examples README](../../examples/test/README.md) - More examples and configuration details - -[↑ Back to top](#table-of-contents) diff --git a/docs/api-reference/.gitkeep b/docs/api-reference/.gitkeep deleted file mode 100644 index 97481357..00000000 --- a/docs/api-reference/.gitkeep +++ /dev/null @@ -1 +0,0 @@ -# This file will be removed once the directory has content diff --git a/docs/architecture.md b/docs/architecture.md deleted file mode 100644 index d6a3180e..00000000 --- a/docs/architecture.md +++ /dev/null @@ -1,505 +0,0 @@ -# Architecture diagrams - -[Back to main index](index.md) - -## Core architecture - -The entry-point that consumers of the SDK interact with is the DurableContext. - -### DurableContext operations - -- **Core Methods**: `set_logger`, `step`, `invoke`, `map`, `parallel`, `run_in_child_context`, `wait`, `create_callback`, `wait_for_callback`, `wait_for_condition` -- **Thread Safety**: Uses `OrderedCounter` for generating sequential step IDs -- **State Management**: Delegates to `ExecutionState` for checkpointing - -### Concurrency implementation - -- **Map/Parallel**: Both inherit from `ConcurrentExecutor` abstract base class -- **Thread Pool**: Uses `ThreadPoolExecutor` for concurrent execution -- **State Tracking**: `ExecutableWithState` manages individual task lifecycle -- **Completion Logic**: `ExecutionCounters` tracks success/failure criteria -- **Suspension**: `TimerScheduler` handles timed suspensions and resumptions - -### Configuration system - -- **Modular Configs**: Separate config classes for each operation type -- **Completion Control**: `CompletionConfig` defines success/failure criteria -- **Serialization**: `SerDes` interface for custom serialization - -### Operation handlers - -- **Separation of Concerns**: Each operation has dedicated handler function -- **Checkpointing**: All operations integrate with execution state checkpointing -- **Error Handling**: Consistent error handling and retry logic across operations - -```mermaid -classDiagram - class DurableContext { - -ExecutionState state - -Any lambda_context - -str _parent_id - -OrderedCounter _step_counter - -LogInfo _log_info - -Logger logger - - +set_logger(LoggerInterface new_logger) - +step(Callable func, str name, StepConfig config) T - +invoke(str function_name, P payload, str name, InvokeConfig config) R - +map(Sequence inputs, Callable func, str name, MapConfig config) BatchResult - +parallel(Sequence functions, str name, ParallelConfig config) BatchResult - +run_in_child_context(Callable func, str name, ChildConfig config) T - +wait(int seconds, str name) - +create_callback(str name, CallbackConfig config) Callback - +wait_for_callback(Callable submitter, str name, WaitForCallbackConfig config) Any - +wait_for_condition(Callable check, WaitForConditionConfig config, str name) T - } - - class DurableContextProtocol { - <> - +step(Callable func, str name, StepConfig config) T - +run_in_child_context(Callable func, str name, ChildConfig config) T - +map(Sequence inputs, Callable func, str name, MapConfig config) BatchResult - +parallel(Sequence functions, str name, ParallelConfig config) BatchResult - +wait(int seconds, str name) - +create_callback(str name, CallbackConfig config) Callback - } - - class OrderedCounter { - -OrderedLock _lock - -int _counter - +increment() int - +decrement() int - +get_current() int - } - - class ExecutionState { - +str durable_execution_arn - +get_checkpoint_result(str operation_id) CheckpointedResult - +create_checkpoint(OperationUpdate operation_update) - } - - class Logger { - +LoggerInterface logger - +LogInfo info - +with_log_info(LogInfo info) Logger - +from_log_info(LoggerInterface logger, LogInfo info) Logger - } - - DurableContext ..|> DurableContextProtocol : implements - DurableContext --> ExecutionState : uses - DurableContext --> OrderedCounter : contains - DurableContext --> Logger : contains -``` - -## Operation handlers - -The `DurableContext` calls operation handlers, which contain the execution logic for each operation. - -```mermaid -classDiagram - class DurableContext { - +step(Callable func, str name, StepConfig config) T - +invoke(str function_name, P payload, str name, InvokeConfig config) R - +map(Sequence inputs, Callable func, str name, MapConfig config) BatchResult - +parallel(Sequence functions, str name, ParallelConfig config) BatchResult - +run_in_child_context(Callable func, str name, ChildConfig config) T - +wait(int seconds, str name) - +create_callback(str name, CallbackConfig config) Callback - +wait_for_callback(Callable submitter, str name, WaitForCallbackConfig config) Any - +wait_for_condition(Callable check, WaitForConditionConfig config, str name) T - } - - class step_handler { - <> - +step_handler(Callable func, ExecutionState state, OperationIdentifier op_id, StepConfig config, Logger logger) T - } - - class invoke_handler { - <> - +invoke_handler(str function_name, P payload, ExecutionState state, OperationIdentifier op_id, InvokeConfig config) R - } - - class map_handler { - <> - +map_handler(Sequence items, Callable func, MapConfig config, ExecutionState state, Callable run_in_child_context) BatchResult - } - - class parallel_handler { - <> - +parallel_handler(Sequence callables, ParallelConfig config, ExecutionState state, Callable run_in_child_context) BatchResult - } - - class child_handler { - <> - +child_handler(Callable func, ExecutionState state, OperationIdentifier op_id, ChildConfig config) T - } - - class wait_handler { - <> - +wait_handler(int seconds, ExecutionState state, OperationIdentifier op_id) - } - - class create_callback_handler { - <> - +create_callback_handler(ExecutionState state, OperationIdentifier op_id, CallbackConfig config) str - } - - class wait_for_callback_handler { - <> - +wait_for_callback_handler(DurableContext context, Callable submitter, str name, WaitForCallbackConfig config) Any - } - - class wait_for_condition_handler { - <> - +wait_for_condition_handler(Callable check, WaitForConditionConfig config, ExecutionState state, OperationIdentifier op_id, Logger logger) T - } - - DurableContext --> step_handler : calls - DurableContext --> invoke_handler : calls - DurableContext --> map_handler : calls - DurableContext --> parallel_handler : calls - DurableContext --> child_handler : calls - DurableContext --> wait_handler : calls - DurableContext --> create_callback_handler : calls - DurableContext --> wait_for_callback_handler : calls - DurableContext --> wait_for_condition_handler : calls -``` - -## Configuration module classes - -```mermaid -classDiagram - class StepConfig { - +Callable retry_strategy - +StepSemantics step_semantics - +SerDes serdes - } - - class InvokeConfig~P,R~ { - +int timeout_seconds - +SerDes~P~ serdes_payload - +SerDes~R~ serdes_result - } - - class MapConfig { - +int max_concurrency - +ItemBatcher item_batcher - +CompletionConfig completion_config - +SerDes serdes - } - - class ParallelConfig { - +int max_concurrency - +CompletionConfig completion_config - +SerDes serdes - } - - class ChildConfig~T~ { - +SerDes serdes - +OperationSubType sub_type - +Callable~T,str~ summary_generator - } - - class CallbackConfig { - +int timeout_seconds - +int heartbeat_timeout_seconds - +SerDes serdes - } - - class WaitForCallbackConfig { - +Callable retry_strategy - } - - class WaitForConditionConfig~T~ { - +Callable wait_strategy - +T initial_state - +SerDes serdes - } - - class CompletionConfig { - +int min_successful - +int tolerated_failure_count - +float tolerated_failure_percentage - +first_successful()$ CompletionConfig - +all_completed()$ CompletionConfig - +all_successful()$ CompletionConfig - } - - class ItemBatcher~T~ { - +int max_items_per_batch - +float max_item_bytes_per_batch - +T batch_input - } - - WaitForCallbackConfig --|> CallbackConfig : extends - MapConfig --> CompletionConfig : contains - MapConfig --> ItemBatcher : contains - ParallelConfig --> CompletionConfig : contains -``` - -## Types and protocols module - -```mermaid -classDiagram - class DurableContextProtocol { - <> - +step(Callable func, str name, StepConfig config) T - +run_in_child_context(Callable func, str name, ChildConfig config) T - +map(Sequence inputs, Callable func, str name, MapConfig config) BatchResult - +parallel(Sequence functions, str name, ParallelConfig config) BatchResult - +wait(int seconds, str name) - +create_callback(str name, CallbackConfig config) Callback - } - - class LoggerInterface { - <> - +debug(object msg, *args, Mapping extra) - +info(object msg, *args, Mapping extra) - +warning(object msg, *args, Mapping extra) - +error(object msg, *args, Mapping extra) - +exception(object msg, *args, Mapping extra) - } - - class CallbackProtocol~C_co~ { - <> - +str callback_id - +result() C_co - } - - class BatchResultProtocol~T~ { - <> - +get_results() list~T~ - } - - class StepContext { - +LoggerInterface logger - } - - class WaitForConditionCheckContext { - +LoggerInterface logger - } - - class OperationContext { - +LoggerInterface logger - } - - StepContext --|> OperationContext : extends - WaitForConditionCheckContext --|> OperationContext : extends -``` - -## SerDes module classes - -```mermaid -classDiagram - class SerDes~T~ { - <> - +serialize(T value, SerDesContext context) str - +deserialize(str data, SerDesContext context) T - } - - class JsonSerDes~T~ { - +serialize(T value, SerDesContext context) str - +deserialize(str data, SerDesContext context) T - } - - class SerDesContext { - +str operation_id - +str durable_execution_arn - } - - class serialize { - <> - +serialize(SerDes serdes, T value, str operation_id, str durable_execution_arn) str - } - - class deserialize { - <> - +deserialize(SerDes serdes, str data, str operation_id, str durable_execution_arn) T - } - - JsonSerDes ..|> SerDes : implements - serialize --> SerDes : uses - deserialize --> SerDes : uses - SerDes --> SerDesContext : uses -``` - -## Concurrency architecture - map and parallel operations - -```mermaid -classDiagram - class ConcurrentExecutor~CallableType,ResultType~ { - <> - +list~Executable~ executables - +int max_concurrency - +CompletionConfig completion_config - +ExecutionCounters counters - +list~ExecutableWithState~ executables_with_state - +Event _completion_event - +SuspendExecution _suspend_exception - - +execute(ExecutionState state, Callable run_in_child_context) BatchResult~ResultType~ - +execute_item(DurableContext child_context, Executable executable)* ResultType - +should_execution_suspend() SuspendResult - -_on_task_complete(ExecutableWithState exe_state, Future future, TimerScheduler scheduler) - -_create_result() BatchResult~ResultType~ - } - - class MapExecutor~T,R~ { - +Sequence~T~ items - +execute_item(DurableContext child_context, Executable executable) R - +from_items(Sequence items, Callable func, MapConfig config)$ MapExecutor - } - - class ParallelExecutor { - +execute_item(DurableContext child_context, Executable executable) R - +from_callables(Sequence callables, ParallelConfig config)$ ParallelExecutor - } - - class Executable~CallableType~ { - +int index - +CallableType func - } - - class ExecutableWithState~CallableType,ResultType~ { - +Executable~CallableType~ executable - -BranchStatus _status - -Future _future - -float _suspend_until - -ResultType _result - -Exception _error - - +run(Future future) - +suspend() - +suspend_with_timeout(float timestamp) - +complete(ResultType result) - +fail(Exception error) - +reset_to_pending() - +can_resume() bool - +is_running() bool - } - - class ExecutionCounters { - +int total_tasks - +int min_successful - +int success_count - +int failure_count - -Lock _lock - - +complete_task() - +fail_task() - +should_complete() bool - +is_all_completed() bool - +is_min_successful_reached() bool - +is_failure_tolerance_exceeded() bool - } - - class TimerScheduler { - +Callable resubmit_callback - -list _pending_resumes - -Lock _lock - -Event _shutdown - -Thread _timer_thread - - +schedule_resume(ExecutableWithState exe_state, float resume_time) - +shutdown() - -_timer_loop() - } - - class BatchResult~R~ { - +list~BatchItem~R~~ all - +CompletionReason completion_reason - +succeeded() list~BatchItem~R~~ - +failed() list~BatchItem~R~~ - +get_results() list~R~ - +throw_if_error() - } - - class BatchItem~R~ { - +int index - +BatchItemStatus status - +R result - +ErrorObject error - } - - MapExecutor --|> ConcurrentExecutor : extends - ParallelExecutor --|> ConcurrentExecutor : extends - ConcurrentExecutor --> ExecutableWithState : manages - ConcurrentExecutor --> ExecutionCounters : uses - ConcurrentExecutor --> TimerScheduler : uses - ConcurrentExecutor --> BatchResult : creates - ExecutableWithState --> Executable : contains - BatchResult --> BatchItem : contains -``` - -## Concurrency flow - -```mermaid -sequenceDiagram - participant DC as DurableContext - participant MH as map_handler - participant ME as MapExecutor - participant CE as ConcurrentExecutor - participant TP as ThreadPoolExecutor - participant TS as TimerScheduler - participant EC as ExecutionCounters - - DC->>MH: map(inputs, func, config) - MH->>ME: MapExecutor.from_items() - ME->>CE: execute(state, run_in_child_context) - - CE->>TP: ThreadPoolExecutor(max_workers) - CE->>TS: TimerScheduler(resubmitter) - CE->>EC: ExecutionCounters(total, min_successful) - - loop For each executable - CE->>TP: submit_task(executable_with_state) - TP->>CE: execute_item_in_child_context() - CE->>DC: run_in_child_context(child_func) - DC->>ME: execute_item(child_context, executable) - end - - par Task Completion Handling - TP->>CE: on_task_complete(future) - CE->>EC: complete_task() / fail_task() - CE->>CE: should_execution_suspend() - alt Should Complete - CE->>CE: _completion_event.set() - else Should Suspend - CE->>TS: schedule_resume(exe_state, timestamp) - end - end - - CE->>CE: _completion_event.wait() - CE->>CE: _create_result() - CE->>DC: BatchResult -``` - -## Threading and locking - -```mermaid -classDiagram - class OrderedLock { - -Lock _lock - -deque~Event~ _waiters - -bool _is_broken - -Exception _exception - - +acquire() bool - +release() - +reset() - +is_broken() bool - +__enter__() OrderedLock - +__exit__(exc_type, exc_val, exc_tb) - } - - class OrderedCounter { - -OrderedLock _lock - -int _counter - - +increment() int - +decrement() int - +get_current() int - } -``` - -[Back to top](#architecture-diagrams) diff --git a/docs/best-practices.md b/docs/best-practices.md deleted file mode 100644 index b717544e..00000000 --- a/docs/best-practices.md +++ /dev/null @@ -1,850 +0,0 @@ -# Best Practices - -## Table of Contents - -- [Overview](#overview) -- [Function design](#function-design) -- [Timeout configuration](#timeout-configuration) -- [Naming conventions](#naming-conventions) -- [Performance optimization](#performance-optimization) -- [Serialization](#serialization) -- [Common mistakes](#common-mistakes) -- [Code organization](#code-organization) -- [FAQ](#faq) -- [See also](#see-also) - -[← Back to main index](index.md) - -## Overview - -This guide covers best practices for building reliable, maintainable durable functions. You'll learn how to design functions that are easy to test, debug, and maintain in production. - -[↑ Back to top](#table-of-contents) - -## Function design - -### Keep functions focused - -Each durable function should have a single, clear purpose. Focused functions are easier to test, debug, and maintain. They also make it simpler to understand execution flow and identify failures. - -**Good:** - -```python -@durable_execution -def process_order(event: dict, context: DurableContext) -> dict: - """Process a single order through validation, payment, and fulfillment.""" - order_id = event["order_id"] - - validation = context.step(validate_order(order_id)) - payment = context.step(process_payment(order_id, event["amount"])) - fulfillment = context.step(fulfill_order(order_id)) - - return {"order_id": order_id, "status": "completed"} -``` - -**Avoid:** - -```python -@durable_execution -def process_everything(event: dict, context: DurableContext) -> dict: - """Process orders, update inventory, send emails, generate reports...""" - # Too many responsibilities - hard to test and maintain - # If one part fails, the entire function needs to retry - pass -``` - -### Wrap non-deterministic code in steps - -All non-deterministic operations must be wrapped in steps: - -```python -@durable_step -def get_timestamp(step_context: StepContext) -> int: - return int(time.time()) - -@durable_step -def generate_id(step_context: StepContext) -> str: - return str(uuid.uuid4()) - -@durable_execution -def lambda_handler(event: dict, context: DurableContext) -> dict: - timestamp = context.step(get_timestamp()) - request_id = context.step(generate_id()) - return {"timestamp": timestamp, "request_id": request_id} -``` - -**Why:** Non-deterministic code produces different values on replay, breaking state consistency. - - -### Use @durable_step for reusable functions - -Decorate functions with `@durable_step` to get automatic naming, better code organization, and cleaner syntax. This makes your code more maintainable and easier to test. - -**Good:** - -```python -@durable_step -def validate_input(step_context: StepContext, data: dict) -> bool: - return all(key in data for key in ["name", "email"]) - -@durable_execution -def lambda_handler(event: dict, context: DurableContext) -> dict: - is_valid = context.step(validate_input(event)) - return {"valid": is_valid} -``` - -**Avoid:** - -```python -@durable_execution -def lambda_handler(event: dict, context: DurableContext) -> dict: - # Lambda functions require explicit names and are harder to test - is_valid = context.step( - lambda _: all(key in event for key in ["name", "email"]), - name="validate_input" - ) - return {"valid": is_valid} -``` - -### Don't share state between steps - -Pass data through return values, not global variables or class attributes. Global state breaks on replay because steps return cached results, but global variables reset to their initial values. - -**Good:** - -```python -@durable_step -def fetch_user(step_context: StepContext, user_id: str) -> dict: - return {"user_id": user_id, "name": "Jane Doe"} - -@durable_step -def send_email(step_context: StepContext, user: dict) -> bool: - send_to_address(user["name"], user.get("email")) - return True - -@durable_execution -def lambda_handler(event: dict, context: DurableContext) -> dict: - user = context.step(fetch_user(event["user_id"])) - sent = context.step(send_email(user)) - return {"sent": sent} -``` - -**Avoid:** - -```python -# DON'T: Global state -current_user = None - -@durable_step -def fetch_user(step_context: StepContext, user_id: str) -> dict: - global current_user - current_user = {"user_id": user_id, "name": "Jane Doe"} - return current_user - -@durable_step -def send_email(step_context: StepContext) -> bool: - # On replay, current_user might be None! - send_to_address(current_user["name"], current_user.get("email")) - return True - -@durable_execution -def lambda_handler(event: dict, context: DurableContext) -> dict: - # First execution: works fine - # On replay: fetch_user returns cached result but doesn't set global variable - # send_email crashes because current_user is None - user = context.step(fetch_user(event["user_id"])) - sent = context.step(send_email()) - return {"sent": sent} -``` - -### Choose the right execution semantics - -Use at-most-once semantics for operations with side effects (payments, emails, database writes) to prevent duplicate execution. Use at-least-once (default) for idempotent operations that are safe to retry. - -**At-most-once for side effects:** - -```python -from aws_durable_execution_sdk_python.config import StepConfig, StepSemantics - -@durable_step -def charge_credit_card(step_context: StepContext, amount: float) -> dict: - return {"transaction_id": "txn_123", "status": "completed"} - -@durable_execution -def lambda_handler(event: dict, context: DurableContext) -> dict: - # Prevent duplicate charges on retry - payment = context.step( - charge_credit_card(event["amount"]), - config=StepConfig(step_semantics=StepSemantics.AT_MOST_ONCE_PER_RETRY), - ) - return payment -``` - -**At-least-once for idempotent operations:** - -```python -@durable_step -def calculate_total(step_context: StepContext, items: list) -> float: - return sum(item["price"] for item in items) - -@durable_execution -def lambda_handler(event: dict, context: DurableContext) -> float: - # Safe to run multiple times - same input produces same output - total = context.step(calculate_total(event["items"])) - return total -``` - -### Handle errors explicitly - -Catch and handle exceptions in your step functions. Distinguish between transient failures (network issues, rate limits) that should retry, and permanent failures (invalid input, not found) that shouldn't. - -**Good:** - -```python -@durable_step -def call_external_api(step_context: StepContext, url: str) -> dict: - try: - response = requests.get(url, timeout=10) - response.raise_for_status() - return response.json() - except requests.Timeout: - raise # Let retry handle timeouts - except requests.HTTPError as e: - if e.response.status_code >= 500: - raise # Retry server errors - # Don't retry client errors (400-499) - return {"error": "client_error", "status": e.response.status_code} -``` - -**Avoid:** - -```python -@durable_step -def call_external_api(step_context: StepContext, url: str) -> dict: - # No error handling - all errors cause retry, even permanent ones - response = requests.get(url) - return response.json() -``` - -[↑ Back to top](#table-of-contents) - -## Timeout configuration - -### Set realistic timeouts - -Choose timeout values based on expected execution time plus buffer for retries and network delays. Too short causes unnecessary failures; too long wastes resources waiting for operations that won't complete. - -**Good:** - -```python -from aws_durable_execution_sdk_python.config import CallbackConfig, Duration - -@durable_execution -def lambda_handler(event: dict, context: DurableContext) -> dict: - # Expected 2 minutes + 1 minute buffer = 3 minutes - callback = context.create_callback( - name="approval", - config=CallbackConfig(timeout=Duration.from_minutes(3)), - ) - return {"callback_id": callback.callback_id} -``` - -**Avoid:** - -```python -@durable_execution -def lambda_handler(event: dict, context: DurableContext) -> dict: - # Too short - will timeout before external system responds - callback = context.create_callback( - name="approval", - config=CallbackConfig(timeout=Duration.from_seconds(5)), - ) - return {"callback_id": callback.callback_id} -``` - -### Use heartbeat timeouts for long operations - -Enable heartbeat monitoring for callbacks that take more than a few minutes. Heartbeats detect when external systems stop responding, preventing you from waiting the full timeout period. - -```python -callback = context.create_callback( - name="approval", - config=CallbackConfig( - timeout=Duration.from_hours(24), # Maximum wait time - heartbeat_timeout=Duration.from_hours(2), # Fail if no heartbeat for 2 hours - ), -) -``` - -Without heartbeat monitoring, you'd wait the full 24 hours even if the external system crashes after 10 minutes. - -### Configure retry delays appropriately - -```python -from aws_durable_execution_sdk_python.retries import RetryStrategyConfig - -# Fast retry for transient network issues -fast_retry = RetryStrategyConfig( - max_attempts=3, - initial_delay_seconds=1, - max_delay_seconds=5, - backoff_rate=2.0, -) - -# Slow retry for rate limiting -slow_retry = RetryStrategyConfig( - max_attempts=5, - initial_delay_seconds=10, - max_delay_seconds=60, - backoff_rate=2.0, -) -``` - -[↑ Back to top](#table-of-contents) - -## Naming conventions - -### Use descriptive operation names - -Choose names that explain what the operation does, not how it does it. Good names make logs easier to read and help you identify which operation failed. - -**Good:** - -```python -@durable_execution -def lambda_handler(event: dict, context: DurableContext) -> dict: - user = context.step(fetch_user(event["user_id"]), name="fetch_user") - validated = context.step(validate_user(user), name="validate_user") - notification = context.step(send_notification(user), name="send_notification") - return {"status": "completed"} -``` - -**Avoid:** - -```python -@durable_execution -def lambda_handler(event: dict, context: DurableContext) -> dict: - # Generic names don't help with debugging - user = context.step(fetch_user(event["user_id"]), name="step1") - validated = context.step(validate_user(user), name="step2") - notification = context.step(send_notification(user), name="step3") - return {"status": "completed"} -``` - -### Use consistent naming patterns - -```python -# Pattern: verb_noun for operations -context.step(validate_order(order_id), name="validate_order") -context.step(process_payment(amount), name="process_payment") - -# Pattern: noun_action for callbacks -context.create_callback(name="payment_callback") -context.create_callback(name="approval_callback") - -# Pattern: descriptive_wait for waits -context.wait(Duration.from_seconds(30), name="payment_confirmation_wait") -``` - -### Name dynamic operations with context - -Include context when creating operations in loops: - -```python -@durable_execution -def lambda_handler(event: dict, context: DurableContext) -> list: - results = [] - for i, item in enumerate(event["items"]): - result = context.step( - process_item(item), - name=f"process_item_{i}_{item['id']}" - ) - results.append(result) - return results -``` - -[↑ Back to top](#table-of-contents) - -## Performance optimization - -### Minimize checkpoint size - -Keep operation inputs and results small. Large payloads increase checkpoint overhead, slow down execution, and can hit size limits. Store large data in S3 and pass references instead. - -**Good:** - -```python -@durable_step -def process_large_dataset(step_context: StepContext, s3_key: str) -> str: - data = download_from_s3(s3_key) - result = process_data(data) - result_key = upload_to_s3(result) - return result_key # Small checkpoint - just the S3 key - -@durable_execution -def lambda_handler(event: dict, context: DurableContext) -> dict: - result_key = context.step(process_large_dataset(event["s3_key"])) - return {"result_key": result_key} -``` - -**Avoid:** - -```python -@durable_step -def process_large_dataset(step_context: StepContext, data: list) -> list: - return process_data(data) # Large checkpoint! - -@durable_execution -def lambda_handler(event: dict, context: DurableContext) -> dict: - # Passing megabytes of data through checkpoints - large_data = download_from_s3(event["s3_key"]) - result = context.step(process_large_dataset(large_data)) - return {"result": result} # Another large checkpoint! -``` - -### Batch operations when possible - -Group related operations to reduce checkpoint overhead. Each step creates a checkpoint, so batching reduces API calls and speeds up execution. - -**Good:** - -```python -@durable_step -def process_batch(step_context: StepContext, items: list) -> list: - return [process_item(item) for item in items] - -@durable_execution -def lambda_handler(event: dict, context: DurableContext) -> list: - items = event["items"] - results = [] - - # Process 10 items per step instead of 1 - for i in range(0, len(items), 10): - batch = items[i:i+10] - batch_results = context.step( - process_batch(batch), - name=f"process_batch_{i//10}" - ) - results.extend(batch_results) - - return results -``` - -**Avoid:** - -```python -@durable_execution -def lambda_handler(event: dict, context: DurableContext) -> list: - results = [] - # Creating a step for each item - too many checkpoints! - for i, item in enumerate(event["items"]): - result = context.step( - lambda _, item=item: process_item(item), - name=f"process_item_{i}" - ) - results.append(result) - return results -``` - -### Use parallel operations for independent work - -Execute independent operations concurrently to reduce total execution time. Use `context.parallel()` to run multiple operations at the same time. - -**Good:** - -```python -@durable_execution -def lambda_handler(event: dict, context: DurableContext) -> dict: - # Execute all three operations concurrently - results = context.parallel( - fetch_user_data(event["user_id"]), - fetch_order_history(event["user_id"]), - fetch_preferences(event["user_id"]), - ) - - return { - "user": results[0], - "orders": results[1], - "preferences": results[2], - } -``` - -**Avoid:** - -```python -@durable_execution -def lambda_handler(event: dict, context: DurableContext) -> dict: - # Sequential execution - each step waits for the previous one - user_data = context.step(fetch_user_data(event["user_id"])) - order_history = context.step(fetch_order_history(event["user_id"])) - preferences = context.step(fetch_preferences(event["user_id"])) - - return { - "user": user_data, - "orders": order_history, - "preferences": preferences, - } -``` - -### Avoid unnecessary waits - -Only use waits when you need to delay execution: - -```python -@durable_execution -def lambda_handler(event: dict, context: DurableContext) -> dict: - job_id = context.step(start_job(event["data"])) - context.wait(Duration.from_seconds(30), name="job_processing_wait") # Necessary - result = context.step(check_job_status(job_id)) - return result -``` - -[↑ Back to top](#table-of-contents) - -## Serialization - -### Use JSON-serializable types - -The SDK uses JSON serialization by default for checkpoints. Stick to JSON-compatible types (dict, list, str, int, float, bool, None) for operation inputs and results. - -**Good:** - -```python -@durable_step -def process_order(step_context: StepContext, order: dict) -> dict: - return { - "order_id": order["id"], - "total": 99.99, - "items": ["item1", "item2"], - "processed": True, - } -``` - -**Avoid:** - -```python -from datetime import datetime -from decimal import Decimal - -@durable_step -def process_order(step_context: StepContext, order: dict) -> dict: - # datetime and Decimal aren't JSON-serializable by default - return { - "order_id": order["id"], - "total": Decimal("99.99"), # Won't serialize! - "timestamp": datetime.now(), # Won't serialize! - } -``` - -### Convert non-serializable types - -Convert complex types to JSON-compatible formats before returning from steps: - -```python -from datetime import datetime -from decimal import Decimal - -@durable_step -def process_order(step_context: StepContext, order: dict) -> dict: - return { - "order_id": order["id"], - "total": float(Decimal("99.99")), # Convert to float - "timestamp": datetime.now().isoformat(), # Convert to string - } -``` - -### Use custom serialization for complex types - -For complex objects, implement custom serialization or use the SDK's SerDes system: - -```python -from dataclasses import dataclass, asdict - -@dataclass -class Order: - order_id: str - total: float - items: list - -@durable_step -def process_order(step_context: StepContext, order_data: dict) -> dict: - order = Order(**order_data) - # Process order... - return asdict(order) # Convert dataclass to dict -``` - -[↑ Back to top](#table-of-contents) - -## Common mistakes - -### ⚠️ Modifying mutable objects between steps - -**Wrong:** - -```python -@durable_execution -def lambda_handler(event: dict, context: DurableContext) -> dict: - data = {"count": 0} - context.step(increment_count(data)) - data["count"] += 1 # DON'T: Mutation outside step - return data -``` - -**Right:** - -```python -@durable_execution -def lambda_handler(event: dict, context: DurableContext) -> dict: - data = {"count": 0} - data = context.step(increment_count(data)) - data = context.step(increment_count(data)) - return data -``` - -### ⚠️ Using context inside its own operations - -**Wrong:** - -```python -@durable_step -def process_with_wait(step_context: StepContext, context: DurableContext) -> str: - # DON'T: Can't use context inside its own step operation - context.wait(Duration.from_seconds(1)) # Error: using context inside step! - result = context.step(nested_step(), name="step2") # Error: nested context.step! - return result - -@durable_execution -def lambda_handler(event: dict, context: DurableContext) -> dict: - # This will fail - context is being used inside its own step - result = context.step(process_with_wait(context), name="step1") - return {"result": result} -``` - -**Right:** - -```python -@durable_step -def nested_step(step_context: StepContext) -> str: - return "nested step" - -@durable_with_child_context -def process_with_wait(child_ctx: DurableContext) -> str: - # Use child context for nested operations - child_ctx.wait(seconds=1) - result = child_ctx.step(nested_step(), name="step2") - return result - -@durable_execution -def lambda_handler(event: dict, context: DurableContext) -> dict: - # Use run_in_child_context for nested operations - result = context.run_in_child_context( - process_with_wait(), - name="block1" - ) - return {"result": result} -``` - -**Why:** You can't use a context object inside its own operations (like calling `context.step()` inside another `context.step()`). Use child contexts to create isolated execution scopes for nested operations. - -### ⚠️ Forgetting to handle callback timeouts - -**Wrong:** - -```python -@durable_execution -def lambda_handler(event: dict, context: DurableContext) -> dict: - callback = context.create_callback(name="approval") - result = callback.result() - return {"approved": result["approved"]} # Crashes if timeout! -``` - -**Right:** - -```python -@durable_execution -def lambda_handler(event: dict, context: DurableContext) -> dict: - callback = context.create_callback(name="approval") - result = callback.result() - - if result is None: - return {"status": "timeout", "approved": False} - - return {"status": "completed", "approved": result.get("approved", False)} -``` - -### ⚠️ Creating too many small steps - -**Wrong:** - -```python -@durable_execution -def lambda_handler(event: dict, context: DurableContext) -> dict: - a = context.step(lambda _: event["a"]) - b = context.step(lambda _: event["b"]) - sum_val = context.step(lambda _: a + b) - return {"result": sum_val} -``` - -**Right:** - -```python -@durable_step -def calculate_result(step_context: StepContext, a: int, b: int) -> int: - return a + b - -@durable_execution -def lambda_handler(event: dict, context: DurableContext) -> dict: - result = context.step(calculate_result(event["a"], event["b"])) - return {"result": result} -``` - -### ⚠️ Not using retry for transient failures - -**Right:** - -```python -from aws_durable_execution_sdk_python.config import StepConfig -from aws_durable_execution_sdk_python.retries import ( - RetryStrategyConfig, - create_retry_strategy, -) - -@durable_step -def call_api(step_context: StepContext, url: str) -> dict: - response = requests.get(url, timeout=10) - response.raise_for_status() - return response.json() - -@durable_execution -def lambda_handler(event: dict, context: DurableContext) -> dict: - retry_config = RetryStrategyConfig( - max_attempts=3, - retryable_error_types=[requests.Timeout, requests.ConnectionError], - ) - - result = context.step( - call_api(event["url"]), - config=StepConfig(retry_strategy=create_retry_strategy(retry_config)), - ) - return result -``` - -[↑ Back to top](#table-of-contents) - -## Code organization - -### Separate business logic from orchestration - -```python -# business_logic.py -@durable_step -def validate_order(step_context: StepContext, order: dict) -> dict: - if not order.get("items"): - raise ValueError("Order must have items") - return {**order, "validated": True} - -# handler.py -@durable_execution -def lambda_handler(event: dict, context: DurableContext) -> dict: - order = event["order"] - validated_order = context.step(validate_order(order)) - return {"status": "completed", "order_id": validated_order["order_id"]} -``` - -### Use child contexts for complex workflows - -```python -@durable_with_child_context -def validate_and_enrich(ctx: DurableContext, data: dict) -> dict: - validated = ctx.step(validate_data(data)) - enriched = ctx.step(enrich_data(validated)) - return enriched - -@durable_execution -def lambda_handler(event: dict, context: DurableContext) -> dict: - enriched = context.run_in_child_context( - validate_and_enrich(event["data"]), - name="validation_phase", - ) - return enriched -``` - -### Group related configuration - -```python -# config.py -from aws_durable_execution_sdk_python.config import StepConfig -from aws_durable_execution_sdk_python.retries import ( - RetryStrategyConfig, - create_retry_strategy, -) - -FAST_RETRY = StepConfig( - retry_strategy=create_retry_strategy( - RetryStrategyConfig( - max_attempts=3, - initial_delay_seconds=1, - max_delay_seconds=5, - backoff_rate=2.0, - ) - ) -) - -# handler.py -from config import FAST_RETRY - -@durable_execution -def lambda_handler(event: dict, context: DurableContext) -> dict: - data = context.step(fetch_data(event["id"]), config=FAST_RETRY) - return data -``` - -[↑ Back to top](#table-of-contents) - -## FAQ - -**Q: How many steps should a durable function have?** - -A: There's a limit of 3,000 operations per execution. Keep in mind that more steps mean more API operations and longer execution time. Balance granularity with performance - group related operations when it makes sense, but don't hesitate to break down complex logic into steps. - -**Q: Should I create a step for every function call?** - -A: No. Only create steps for operations that need checkpointing, retry logic, or isolation. - -**Q: Can I use async/await in durable functions?** - -A: Functions decorated with `@durable_step` must be synchronous. If you need to call async code, use `asyncio.run()` inside your step to execute it synchronously. - -**Q: How do I handle secrets and credentials?** - -A: Use AWS Secrets Manager or Parameter Store. Fetch secrets in a step at the beginning of your workflow. - -**Q: What's the maximum execution time for a durable function?** - -A: Durable functions can run for days or weeks using waits and callbacks. Each individual Lambda invocation is still subject to the 15-minute Lambda timeout. - -**Q: How do I test durable functions locally?** - -A: Use the testing SDK (`aws-durable-execution-sdk-python-testing`) to run functions locally without AWS credentials. See [Testing patterns](testing-patterns/basic-tests.md) for examples. - -**Q: How do I monitor durable functions in production?** - -A: Use CloudWatch Logs for execution logs, CloudWatch Metrics for performance metrics, and X-Ray for distributed tracing. - -[↑ Back to top](#table-of-contents) - -## See also - -- [Getting started](getting-started.md) - Build your first durable function -- [Steps](core/steps.md) - Step operations -- [Error handling](advanced/error-handling.md) - Handle failures -- [Configuration](api-reference/config.md) - Configuration options -- [Testing patterns](testing-patterns/basic-tests.md) - How to test your functions - -[↑ Back to top](#table-of-contents) - -## License - -See the [LICENSE](../LICENSE) file for our project's licensing. - -[↑ Back to top](#table-of-contents) diff --git a/docs/core/callbacks.md b/docs/core/callbacks.md deleted file mode 100644 index b2001775..00000000 --- a/docs/core/callbacks.md +++ /dev/null @@ -1,877 +0,0 @@ -# Callbacks - -## Table of Contents - -- [Terminology](#terminology) -- [What are callbacks?](#what-are-callbacks) -- [Key features](#key-features) -- [Getting started](#getting-started) -- [Method signatures](#method-signatures) -- [Configuration](#configuration) -- [Waiting for callbacks](#waiting-for-callbacks) -- [Integration patterns](#integration-patterns) -- [Advanced patterns](#advanced-patterns) -- [Best practices](#best-practices) -- [FAQ](#faq) -- [Testing](#testing) -- [See also](#see-also) - -[← Back to main index](../index.md) - -## Terminology - -**Callback** - A mechanism that pauses execution and waits for an external system to provide a result. Created using `context.create_callback()`. - -**Callback ID** - A unique identifier for a callback that you send to external systems. The external system uses this ID to send the result back. - -**Callback timeout** - The maximum time to wait for a callback response. If the timeout expires without a response, the callback fails. - -**Heartbeat timeout** - The maximum time between heartbeat signals from the external system. Use this to detect when external systems stop responding. - -**Wait for callback** - The operation that pauses execution until the callback receives a result. Created using `context.wait_for_callback()`. - -[↑ Back to top](#table-of-contents) - -## What are callbacks? - -Callbacks let your durable function pause and wait for external systems to respond. When you create a callback, you get a unique callback ID that you can send to external systems like approval workflows, payment processors, or third-party APIs. Your function pauses until the external system calls back with a result. - -Use callbacks to: -- Wait for human approvals in workflows -- Integrate with external payment systems -- Coordinate with third-party APIs -- Handle long-running external processes -- Implement request-response patterns with external systems - -[↑ Back to top](#table-of-contents) - -## Key features - -- **External system integration** - Pause execution and wait for external responses -- **Unique callback IDs** - Each callback gets a unique identifier for routing -- **Configurable timeouts** - Set maximum wait times and heartbeat intervals -- **Type-safe results** - Callbacks are generic and preserve result types -- **Automatic checkpointing** - Callback results are saved automatically -- **Heartbeat monitoring** - Detect when external systems stop responding - -[↑ Back to top](#table-of-contents) - -## Getting started - -Callbacks let you pause your durable function while waiting for an external system to respond. Think of it like this: - -**Your durable function:** -1. Creates a callback and gets a unique `callback_id` -2. Sends the `callback_id` to an external system (payment processor, approval system, etc.) -3. Calls `callback.result()` - execution pauses here ⏸️ -4. When the callback is notified, execution resumes ▢️ - -**Your notification handler** (separate Lambda or service): -1. Receives the result from the external system (via webhook, queue, etc.) -2. Calls AWS Lambda API `SendDurableExecutionCallbackSuccess` with the `callback_id` -3. This wakes up your durable function - -The key insight: callbacks need two pieces working together - one that waits, and one that notifies. - -### Basic example - -Here's a simple example showing the durable function side: - -```python -from typing import Any -from aws_durable_execution_sdk_python import DurableContext, durable_execution -from aws_durable_execution_sdk_python.config import CallbackConfig, Duration - -@durable_execution -def handler(event: Any, context: DurableContext) -> dict: - """Create a callback and wait for external system response.""" - # Step 1: Create the callback - callback_config = CallbackConfig( - timeout=Duration.from_minutes(2), - heartbeat_timeout=Duration.from_seconds(60), - ) - - callback = context.create_callback( - name="example_callback", - config=callback_config, - ) - - # Step 2: Send callback ID to external system - # In a real scenario, you'd send this to a third-party API, - # message queue, or webhook endpoint - send_to_external_system({ - "callback_id": callback.callback_id, - "data": event.get("data"), - }) - - # Step 3: Wait for the result - execution suspends here - result = callback.result() - - # Step 4: Execution resumes when result is received - return { - "status": "completed", - "result": result, - } -``` - -### Notifying the callback - -When your external system finishes processing, you need to notify the callback using AWS Lambda APIs. You have three options: - -**send_durable_execution_callback_success** - Notify success with a result: - -```python -import boto3 -import json - -lambda_client = boto3.client('lambda') - -# When external system succeeds -callback_id = "abc123-callback-id-from-durable-function" -result_data = json.dumps({'status': 'approved', 'amount': 1000}).encode('utf-8') - -lambda_client.send_durable_execution_callback_success( - CallbackId=callback_id, - Result=result_data -) -``` - -**send_durable_execution_callback_failure** - Notify failure with an error: - -```python -# When external system fails -callback_id = "abc123-callback-id-from-durable-function" - -lambda_client.send_durable_execution_callback_failure( - CallbackId=callback_id, - Error={ - 'ErrorType': 'PaymentDeclined', - 'ErrorMessage': 'Insufficient funds' - } -) -``` - -**send_durable_execution_callback_heartbeat** - Send heartbeat to keep callback alive: - -```python -# Send heartbeat for long-running operations -callback_id = "abc123-callback-id-from-durable-function" - -lambda_client.send_durable_execution_callback_heartbeat( - CallbackId=callback_id -) -``` - -### Complete example with message broker - -Here's a complete example showing both sides of the callback flow: - -```python -# Durable function side -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Process payment with external payment processor.""" - # Create callback - callback = context.create_callback( - name="payment_callback", - config=CallbackConfig(timeout=Duration.from_minutes(5)), - ) - - # Send to message broker (SQS, SNS, EventBridge, etc.) - send_to_payment_queue({ - "callback_id": callback.callback_id, - "amount": event["amount"], - "customer_id": event["customer_id"], - }) - - # Wait for result - execution suspends here - payment_result = callback.result() - - # Execution resumes here when callback is notified - return { - "payment_status": payment_result.get("status"), - "transaction_id": payment_result.get("transaction_id"), - } -``` - -```python -# Message processor side (separate Lambda or service) -import boto3 -import json - -lambda_client = boto3.client('lambda') - -def process_payment_message(event: dict): - """Process payment and notify callback.""" - callback_id = event["callback_id"] - amount = event["amount"] - customer_id = event["customer_id"] - - try: - # Process payment with external system - result = payment_processor.charge(customer_id, amount) - - # Notify success - result_data = json.dumps({ - 'status': 'completed', - 'transaction_id': result.transaction_id, - }).encode('utf-8') - - lambda_client.send_durable_execution_callback_success( - CallbackId=callback_id, - Result=result_data - ) - except PaymentError as e: - # Notify failure - lambda_client.send_durable_execution_callback_failure( - CallbackId=callback_id, - Error={ - 'ErrorType': 'PaymentError', - 'ErrorMessage': f'{e.error_code}: {str(e)}' - } - ) -``` - -### Key points - -- **Callbacks require two parts**: Your durable function creates the callback, and a separate process notifies the result -- **Use Lambda APIs to notify**: `SendDurableExecutionCallbackSuccess`, `SendDurableExecutionCallbackFailure`, or `SendDurableExecutionCallbackHeartbeat` -- **Execution suspends at `callback.result()`**: Your function stops running and doesn't consume resources while waiting -- **Execution resumes when notified**: When you call the Lambda API with the callback ID, your function resumes from where it suspended -- **Heartbeats keep callbacks alive**: For long operations, send heartbeats to prevent timeout - -[↑ Back to top](#table-of-contents) - -## Method signatures - -### context.create_callback() - -```python -def create_callback( - name: str | None = None, - config: CallbackConfig | None = None, -) -> Callback[T] -``` - -**Parameters:** - -- `name` (optional) - A name for the callback, useful for debugging and testing -- `config` (optional) - A `CallbackConfig` object to configure timeout behavior - -**Returns:** A `Callback` object with a `callback_id` property - -**Type parameter:** `T` - The type of result the callback will receive - -### callback.callback_id - -```python -callback_id: str -``` - -A unique identifier for this callback. Send this ID to external systems so they can return results. - -### callback.result() - -```python -def result() -> T | None -``` - -Returns the callback result. Blocks until the result is available or the callback times out. - -[↑ Back to top](#table-of-contents) - -## Configuration - -Configure callback behavior using `CallbackConfig`: - -```python -from aws_durable_execution_sdk_python import DurableContext, durable_execution -from aws_durable_execution_sdk_python.config import CallbackConfig, Duration - -@durable_execution -def handler(event: dict, context: DurableContext) -> str: - # Configure callback with custom timeouts - config = CallbackConfig( - timeout=Duration.from_seconds(60), - heartbeat_timeout=Duration.from_seconds(30), - ) - - callback = context.create_callback( - name="timeout_callback", - config=config, - ) - - return f"Callback created with 60s timeout: {callback.callback_id}" -``` - -### CallbackConfig parameters - -**timeout** - Maximum time to wait for the callback response. Use `Duration` helpers to specify: -- `Duration.from_seconds(60)` - 60 seconds -- `Duration.from_minutes(5)` - 5 minutes -- `Duration.from_hours(2)` - 2 hours -- `Duration.from_days(1)` - 1 day - -**heartbeat_timeout** - Maximum time between heartbeat signals from the external system. If the external system doesn't send a heartbeat within this interval, the callback fails. Set to 0 or omit to disable heartbeat monitoring. - -**serdes** (optional) - Custom serialization/deserialization for the callback result. If not provided, uses JSON serialization. - -### Duration helpers - -The `Duration` class provides convenient methods for specifying timeouts: - -```python -from aws_durable_execution_sdk_python.config import Duration - -# Various ways to specify duration -timeout_60s = Duration.from_seconds(60) -timeout_5m = Duration.from_minutes(5) -timeout_2h = Duration.from_hours(2) -timeout_1d = Duration.from_days(1) - -# Use in CallbackConfig -config = CallbackConfig( - timeout=Duration.from_hours(2), - heartbeat_timeout=Duration.from_minutes(15), -) -``` - -[↑ Back to top](#table-of-contents) - -## Waiting for callbacks - -After creating a callback, you typically wait for its result. There are two ways to do this: - -### Using callback.result() - -Call `result()` on the callback object to wait for the response: - -```python -from aws_durable_execution_sdk_python import DurableContext, durable_execution -from aws_durable_execution_sdk_python.config import CallbackConfig, Duration - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - # Create callback - callback = context.create_callback( - name="approval_callback", - config=CallbackConfig(timeout=Duration.from_hours(24)), - ) - - # Send callback ID to approval system - send_approval_request(callback.callback_id, event["request_details"]) - - # Wait for approval response - approval_result = callback.result() - - if approval_result and approval_result.get("approved"): - return {"status": "approved", "details": approval_result} - else: - return {"status": "rejected"} -``` - -### Using context.wait_for_callback() - -Alternatively, use `wait_for_callback()` to wait for a callback by its ID: - -```python -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - # Create callback - callback = context.create_callback(name="payment_callback") - - # Send to payment processor - initiate_payment(callback.callback_id, event["amount"]) - - # Wait for payment result - payment_result = context.wait_for_callback( - callback.callback_id, - config=CallbackConfig(timeout=Duration.from_minutes(5)), - ) - - return {"payment_status": payment_result} -``` - -[↑ Back to top](#table-of-contents) - -## Integration patterns - -### Human approval workflow - -Use callbacks to pause execution while waiting for human approval: - -```python -from aws_durable_execution_sdk_python import DurableContext, durable_execution -from aws_durable_execution_sdk_python.config import CallbackConfig, Duration - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Process an order that requires approval.""" - order_id = event["order_id"] - - # Create callback for approval - approval_callback = context.create_callback( - name="order_approval", - config=CallbackConfig( - timeout=Duration.from_hours(48), # 48 hours to approve - heartbeat_timeout=Duration.from_hours(12), # Check every 12 hours - ), - ) - - # Send approval request to approval system - # The approval system will use callback.callback_id to respond - send_to_approval_system({ - "callback_id": approval_callback.callback_id, - "order_id": order_id, - "details": event["order_details"], - }) - - # Wait for approval - approval = approval_callback.result() - - if approval and approval.get("approved"): - # Process approved order - return process_order(order_id) - else: - # Handle rejection - return {"status": "rejected", "reason": approval.get("reason")} -``` - -### Payment processing - -Integrate with external payment processors: - -```python -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Process a payment with external processor.""" - amount = event["amount"] - customer_id = event["customer_id"] - - # Create callback for payment result - payment_callback = context.create_callback( - name="payment_processing", - config=CallbackConfig( - timeout=Duration.from_minutes(5), - heartbeat_timeout=Duration.from_seconds(30), - ), - ) - - # Initiate payment with external processor - initiate_payment_with_processor({ - "callback_id": payment_callback.callback_id, - "amount": amount, - "customer_id": customer_id, - "callback_url": f"https://api.example.com/callbacks/{payment_callback.callback_id}", - }) - - # Wait for payment result - payment_result = payment_callback.result() - - return { - "transaction_id": payment_result.get("transaction_id"), - "status": payment_result.get("status"), - "amount": amount, - } -``` - -### Third-party API integration - -Wait for responses from third-party APIs: - -```python -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Integrate with third-party data enrichment API.""" - user_data = event["user_data"] - - # Create callback for enrichment result - enrichment_callback = context.create_callback( - name="data_enrichment", - config=CallbackConfig(timeout=Duration.from_minutes(10)), - ) - - # Request data enrichment from third-party - request_data_enrichment({ - "callback_id": enrichment_callback.callback_id, - "user_data": user_data, - "webhook_url": f"https://api.example.com/webhooks/{enrichment_callback.callback_id}", - }) - - # Wait for enriched data - enriched_data = enrichment_callback.result() - - # Combine original and enriched data - return { - "original": user_data, - "enriched": enriched_data, - "timestamp": enriched_data.get("processed_at"), - } -``` - -### Multiple callbacks - -Handle multiple external systems in parallel: - -```python -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Wait for multiple external systems.""" - # Create callbacks for different systems - credit_check = context.create_callback( - name="credit_check", - config=CallbackConfig(timeout=Duration.from_minutes(5)), - ) - - fraud_check = context.create_callback( - name="fraud_check", - config=CallbackConfig(timeout=Duration.from_minutes(3)), - ) - - # Send requests to external systems - request_credit_check(credit_check.callback_id, event["customer_id"]) - request_fraud_check(fraud_check.callback_id, event["transaction_data"]) - - # Wait for both results - credit_result = credit_check.result() - fraud_result = fraud_check.result() - - # Make decision based on both checks - approved = ( - credit_result.get("score", 0) > 650 and - fraud_result.get("risk_level") == "low" - ) - - return { - "approved": approved, - "credit_score": credit_result.get("score"), - "fraud_risk": fraud_result.get("risk_level"), - } -``` - -[↑ Back to top](#table-of-contents) - -## Advanced patterns - -### Callback with retry - -Combine callbacks with retry logic for resilient integrations: - -```python -from aws_durable_execution_sdk_python import ( - DurableContext, - durable_execution, - durable_step, - StepContext, -) -from aws_durable_execution_sdk_python.config import ( - CallbackConfig, - Duration, - StepConfig, -) -from aws_durable_execution_sdk_python.retries import ( - RetryStrategyConfig, - create_retry_strategy, -) - -@durable_step -def wait_for_external_system( - step_context: StepContext, - callback_id: str, -) -> dict: - """Wait for external system with retry on timeout.""" - # This will retry if the callback times out - result = context.wait_for_callback( - callback_id, - config=CallbackConfig(timeout=Duration.from_minutes(2)), - ) - return result - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - # Create callback - callback = context.create_callback(name="external_api") - - # Send request - send_external_request(callback.callback_id) - - # Wait with retry - retry_config = RetryStrategyConfig( - max_attempts=3, - initial_delay_seconds=5, - ) - - result = context.step( - wait_for_external_system(callback.callback_id), - config=StepConfig(retry_strategy=create_retry_strategy(retry_config)), - ) - - return result -``` - -### Conditional callback handling - -Handle different callback results based on conditions: - -```python -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Handle callback results conditionally.""" - callback = context.create_callback( - name="conditional_callback", - config=CallbackConfig(timeout=Duration.from_minutes(10)), - ) - - # Send request - send_request(callback.callback_id, event["request_type"]) - - # Wait for result - result = callback.result() - - # Handle different result types - if result is None: - return {"status": "timeout", "message": "No response received"} - - result_type = result.get("type") - - if result_type == "success": - return process_success(result) - elif result_type == "partial": - return process_partial(result) - else: - return process_failure(result) -``` - -### Callback with fallback - -Implement fallback logic when callbacks timeout: - -```python -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Use fallback when callback times out.""" - callback = context.create_callback( - name="primary_service", - config=CallbackConfig(timeout=Duration.from_seconds(30)), - ) - - # Try primary service - send_to_primary_service(callback.callback_id, event["data"]) - - result = callback.result() - - if result is None: - # Primary service timed out, use fallback - fallback_callback = context.create_callback( - name="fallback_service", - config=CallbackConfig(timeout=Duration.from_minutes(2)), - ) - - send_to_fallback_service(fallback_callback.callback_id, event["data"]) - result = fallback_callback.result() - - return {"result": result, "source": "primary" if result else "fallback"} -``` - -[↑ Back to top](#table-of-contents) - -## Best practices - -**Set appropriate timeouts** - Choose timeout values based on your external system's expected response time. Add buffer for network delays and processing time. - -**Use heartbeat timeouts for long operations** - Enable heartbeat monitoring for callbacks that take more than a few minutes. This helps detect when external systems stop responding. - -**Send callback IDs securely** - Treat callback IDs as sensitive data. Use HTTPS when sending them to external systems. - -**Handle timeout scenarios** - Always handle the case where `callback.result()` returns `None` due to timeout. Implement fallback logic or error handling. - -**Name callbacks for debugging** - Use descriptive names to identify callbacks in logs and tests. - -**Don't reuse callback IDs** - Each callback gets a unique ID. Don't try to reuse IDs across different operations. - -**Validate callback results** - Always validate the structure and content of callback results before using them. - -**Use type hints** - Specify the expected result type when creating callbacks: `Callback[dict]`, `Callback[str]`, etc. - -**Monitor callback metrics** - Track callback success rates, timeout rates, and response times to identify integration issues. - -**Document callback contracts** - Clearly document what data external systems should send back and in what format. - -[↑ Back to top](#table-of-contents) - -## FAQ - -**Q: What happens if a callback times out?** - -A: If the timeout expires before receiving a result, `callback.result()` returns `None`. You should handle this case in your code. - -**Q: Can I cancel a callback?** - -A: No, callbacks can't be cancelled once created. They either receive a result or timeout. - -**Q: How do external systems send results back?** - -A: External systems use the callback ID to send results through your application's callback endpoint. You need to implement an endpoint that receives the callback ID and result, then forwards it to the durable execution service. - -**Q: Can I create multiple callbacks in one function?** - -A: Yes, you can create as many callbacks as needed. Each gets a unique callback ID. - -**Q: What's the maximum timeout for a callback?** - -A: You can set any timeout value using `Duration` helpers. For long-running operations (hours or days), use longer timeouts and enable heartbeat monitoring to detect if external systems stop responding. - -**Q: Do I need to wait for a callback immediately after creating it?** - -A: No, you can create a callback, send its ID to an external system, perform other operations, and wait for the result later in your function. - -**Q: Can callbacks be used with steps?** - -A: Yes, you can create and wait for callbacks inside step functions. However, `context.wait_for_callback()` is a convenience method that already wraps the callback in a step with retry logic for you. - -**Q: What happens if the external system sends a result after the timeout?** - -A: Late results are ignored. The callback has already failed due to timeout. - -**Q: How do I test functions with callbacks?** - -A: Use the testing SDK to simulate callback responses. See the Testing section below for examples. - -**Q: Can I use callbacks in child contexts?** - -A: Yes, callbacks work in child contexts just like in the main context. - -**Q: What's the difference between timeout and heartbeat_timeout?** - -A: `timeout` is the maximum total wait time. `heartbeat_timeout` is the maximum time between heartbeat signals. Use heartbeat timeout to detect when external systems stop responding before the main timeout expires. - -[↑ Back to top](#table-of-contents) - -## Testing - -You can test callbacks using the testing SDK. The test runner lets you simulate callback responses and verify callback behavior. - -### Basic callback testing - -```python -import pytest -from aws_durable_execution_sdk_python_testing import InvocationStatus -from examples.src.callback import callback - -@pytest.mark.durable_execution( - handler=callback.handler, - lambda_function_name="callback", -) -def test_callback(durable_runner): - """Test callback creation.""" - with durable_runner: - result = durable_runner.run(input="test", timeout=10) - - # Check overall status - assert result.status is InvocationStatus.SUCCEEDED - - # Verify callback was created - assert "Callback created with ID:" in result.result -``` - -### Inspecting callback operations - -Use `result.operations` to inspect callback details: - -```python -@pytest.mark.durable_execution( - handler=callback.handler, - lambda_function_name="callback", -) -def test_callback_operation(durable_runner): - """Test and inspect callback operation.""" - with durable_runner: - result = durable_runner.run(input="test", timeout=10) - - # Find callback operations - callback_ops = [ - op for op in result.operations - if op.operation_type.value == "CALLBACK" - ] - - assert len(callback_ops) == 1 - callback_op = callback_ops[0] - - # Verify callback properties - assert callback_op.name == "example_callback" - assert callback_op.callback_id is not None -``` - -### Testing callback timeouts - -Test that callbacks handle timeouts correctly: - -```python -from examples.src.callback import callback_with_timeout - -@pytest.mark.durable_execution( - handler=callback_with_timeout.handler, - lambda_function_name="callback_timeout", -) -def test_callback_timeout(durable_runner): - """Test callback with custom timeout.""" - with durable_runner: - result = durable_runner.run(input={}, timeout=10) - - assert result.status is InvocationStatus.SUCCEEDED - assert "60s timeout" in result.result -``` - -### Testing callback integration patterns - -Test complete integration workflows: - -```python -@pytest.mark.durable_execution( - handler=approval_workflow_handler, - lambda_function_name="approval_workflow", -) -def test_approval_workflow(durable_runner): - """Test approval workflow with callback.""" - with durable_runner: - result = durable_runner.run( - input={"order_id": "order-123", "amount": 1000}, - timeout=30, - ) - - # Verify workflow completed - assert result.status is InvocationStatus.SUCCEEDED - - # Check callback was created - callback_ops = [ - op for op in result.operations - if op.operation_type.value == "CALLBACK" - ] - assert len(callback_ops) == 1 - assert callback_ops[0].name == "order_approval" -``` - -For more testing patterns, see: -- [Basic tests](../testing-patterns/basic-tests.md) - Simple test examples -- [Complex workflows](../testing-patterns/complex-workflows.md) - Multi-step workflow testing -- [Best practices](../testing-patterns/best-practices.md) - Testing recommendations - -[↑ Back to top](#table-of-contents) - -## See also - -- [DurableContext API](../api-reference/context.md) - Complete context reference -- [CallbackConfig](../api-reference/config.md) - Configuration options -- [Duration helpers](../api-reference/config.md#duration) - Time duration utilities -- [Steps](steps.md) - Combine callbacks with steps for retry logic -- [Child contexts](child-contexts.md) - Use callbacks in nested contexts -- [Error handling](../advanced/error-handling.md) - Handle callback failures -- [Examples](https://github.com/awslabs/aws-durable-execution-sdk-python/tree/main/examples/src/callback) - More callback examples - -[↑ Back to top](#table-of-contents) - -## License - -See the LICENSE file for our project's licensing. - -[↑ Back to top](#table-of-contents) diff --git a/docs/core/child-contexts.md b/docs/core/child-contexts.md deleted file mode 100644 index 4c44fd14..00000000 --- a/docs/core/child-contexts.md +++ /dev/null @@ -1,703 +0,0 @@ -# Child Contexts - -## Table of Contents - -- [Terminology](#terminology) -- [What are child contexts?](#what-are-child-contexts) -- [Key features](#key-features) -- [Getting started](#getting-started) -- [Method signatures](#method-signatures) -- [Using the @durable_with_child_context decorator](#using-the-durable_with_child_context-decorator) -- [Naming child contexts](#naming-child-contexts) -- [Use cases for isolation](#use-cases-for-isolation) -- [Advanced patterns](#advanced-patterns) -- [Best practices](#best-practices) -- [FAQ](#faq) -- [Testing](#testing) -- [See also](#see-also) - -[← Back to main index](../index.md) - -## Terminology - -**Child context** - An isolated execution scope within a durable function. Created using `context.run_in_child_context()`. - -**Parent context** - The main durable function context that creates child contexts. - -**Context function** - A function decorated with `@durable_with_child_context` that receives a `DurableContext` and can execute operations. - -**Context isolation** - Child contexts have their own operation namespace, preventing naming conflicts with the parent context. - -**Context result** - The return value from a child context function, which is checkpointed as a single unit in the parent context. - -[↑ Back to top](#table-of-contents) - -## What are child contexts? - -A child context creates a scope in which you can nest durable operations. It creates an isolated execution scope with its own set of operations, checkpoints, and state. This is often useful as a unit of concurrency that lets you run concurrent operations within your durable function. You can also use child contexts to wrap large chunks of durable logic into a single piece - once completed, that logic won't run or replay again. - -Use child contexts to: -- Run concurrent operations (steps, waits, callbacks) in parallel -- Wrap large blocks of logic that should execute as a single unit -- Handle large data that exceeds individual step limits -- Isolate groups of related operations -- Create reusable components -- Improve code organization and maintainability - -[↑ Back to top](#table-of-contents) - -## Key features - -- **Concurrency unit** - Run multiple operations concurrently within your function -- **Execution isolation** - Child contexts have their own operation namespace -- **Single-unit checkpointing** - Completed child contexts never replay -- **Large data handling** - Process data that exceeds individual step limits -- **Named contexts** - Identify contexts by name for debugging and testing - -[↑ Back to top](#table-of-contents) - -## Getting started - -Here's an example showing why child contexts are useful - they let you group multiple operations that execute as a single unit: - -```python -from aws_durable_execution_sdk_python import ( - DurableContext, - durable_execution, - durable_step, - durable_with_child_context, - StepContext, -) - -@durable_step -def validate_order(step_context: StepContext, order_id: str) -> dict: - """Validate order details.""" - # Validation logic here - return {"valid": True, "order_id": order_id} - -@durable_step -def reserve_inventory(step_context: StepContext, order_id: str) -> dict: - """Reserve inventory for order.""" - # Inventory logic here - return {"reserved": True, "order_id": order_id} - -@durable_step -def charge_payment(step_context: StepContext, order_id: str) -> dict: - """Charge payment for order.""" - # Payment logic here - return {"charged": True, "order_id": order_id} - -@durable_step -def send_confirmation(step_context: StepContext, result: dict) -> dict: - """Send order confirmation.""" - # Notification logic here - return {"sent": True, "order_id": result["order_id"]} - -@durable_with_child_context -def process_order(ctx: DurableContext, order_id: str) -> dict: - """Process an order with multiple steps.""" - # These three steps execute as a single unit - validation = ctx.step(validate_order(order_id)) - inventory = ctx.step(reserve_inventory(order_id)) - payment = ctx.step(charge_payment(order_id)) - - return {"order_id": order_id, "status": "completed"} - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Process order using a child context.""" - # Once this completes, it never replays - even if the function continues - result = context.run_in_child_context( - process_order(event["order_id"]), - name="order_processing" - ) - - # Additional operations here won't cause process_order to replay - context.step(send_confirmation(result)) - - return result -``` - -**Why use a child context here?** - -Child contexts let you group related operations into a logical unit. Once `process_order` completes, its result is saved just like a step - everything inside won't replay even if the function continues or restarts. This provides organizational benefits and a small optimization by avoiding unnecessary replays. - -**Key benefits:** - -- **Organization**: Group related operations together for better code structure and readability -- **Reusability**: Call `process_order` multiple times in the same function, and each execution is tracked independently -- **Isolation**: Child contexts act like checkpointed functions - once done, they're done - -[↑ Back to top](#table-of-contents) - -## Method signatures - -### context.run_in_child_context() - -```python -def run_in_child_context( - func: Callable[[DurableContext], T], - name: str | None = None, -) -> T -``` - -**Parameters:** - -- `func` - A callable that receives a `DurableContext` and returns a result. Use the `@durable_with_child_context` decorator to create context functions. -- `name` (optional) - A name for the child context, useful for debugging and testing - -**Returns:** The result of executing the context function. - -**Raises:** Any exception raised by the context function. - -### @durable_with_child_context decorator - -```python -@durable_with_child_context -def my_context_function(ctx: DurableContext, arg1: str, arg2: int) -> dict: - # Your operations here - return result -``` - -The decorator wraps your function so it can be called with arguments and passed to `context.run_in_child_context()`. - -[↑ Back to top](#table-of-contents) - -## Using the @durable_with_child_context decorator - -The `@durable_with_child_context` decorator marks a function as a context function. Context functions receive a `DurableContext` as their first parameter and can execute any durable operations: - -```python -from aws_durable_execution_sdk_python import ( - DurableContext, - durable_execution, - durable_with_child_context, -) - -@durable_with_child_context -def process_order(ctx: DurableContext, order_id: str, items: list) -> dict: - """Process an order in a child context.""" - # Validate items - validation = ctx.step( - lambda _: validate_items(items), - name="validate_items" - ) - - if not validation["valid"]: - return {"status": "invalid", "errors": validation["errors"]} - - # Calculate total - total = ctx.step( - lambda _: calculate_total(items), - name="calculate_total" - ) - - # Process payment - payment = ctx.step( - lambda _: process_payment(order_id, total), - name="process_payment" - ) - - return { - "order_id": order_id, - "total": total, - "payment_status": payment["status"], - } - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Process an order using a child context.""" - order_id = event["order_id"] - items = event["items"] - - # Execute order processing in child context - result = context.run_in_child_context( - process_order(order_id, items) - ) - - return result -``` - -**Why use @durable_with_child_context?** - -The decorator wraps your function so it can be called with arguments and passed to `context.run_in_child_context()`. It provides a convenient way to define reusable workflow components. - -[↑ Back to top](#table-of-contents) - -## Naming child contexts - -You can name child contexts explicitly using the `name` parameter. Named contexts are easier to identify in logs and tests: - -```python -@durable_with_child_context -def data_processing(ctx: DurableContext, data: dict) -> dict: - """Process data in a child context.""" - result = ctx.step(lambda _: transform_data(data), name="transform") - return result - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - # Named child context - result = context.run_in_child_context( - data_processing(event["data"]), - name="data_processor" - ) - return result -``` - -**Naming best practices:** - -- Use descriptive names that explain what the context does -- Keep names consistent across your codebase -- Use names when you need to inspect specific contexts in tests -- Names help with debugging and monitoring - -[↑ Back to top](#table-of-contents) - -## Use cases for isolation - -### Organizing complex workflows - -Use child contexts to organize complex workflows into logical units: - -```python -@durable_with_child_context -def inventory_check(ctx: DurableContext, items: list) -> dict: - """Check inventory for all items.""" - results = [] - for item in items: - available = ctx.step( - lambda _: check_item_availability(item), - name=f"check_{item['id']}" - ) - results.append({"item_id": item["id"], "available": available}) - - return {"all_available": all(r["available"] for r in results)} - -@durable_with_child_context -def payment_processing(ctx: DurableContext, order_total: float) -> dict: - """Process payment in isolated context.""" - auth = ctx.step( - lambda _: authorize_payment(order_total), - name="authorize" - ) - - if auth["approved"]: - capture = ctx.step( - lambda _: capture_payment(auth["transaction_id"]), - name="capture" - ) - return {"status": "completed", "transaction_id": capture["id"]} - - return {"status": "declined"} - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Process order with organized child contexts.""" - # Check inventory - inventory = context.run_in_child_context( - inventory_check(event["items"]), - name="inventory_check" - ) - - if not inventory["all_available"]: - return {"status": "failed", "reason": "items_unavailable"} - - # Process payment - payment = context.run_in_child_context( - payment_processing(event["total"]), - name="payment_processing" - ) - - if payment["status"] != "completed": - return {"status": "failed", "reason": "payment_declined"} - - return { - "status": "success", - "transaction_id": payment["transaction_id"], - } -``` - -### Creating reusable components - -Child contexts make it easy to create reusable workflow components: - -```python -@durable_with_child_context -def send_notifications(ctx: DurableContext, user_id: str, message: str) -> dict: - """Send notifications through multiple channels.""" - email_sent = ctx.step( - lambda _: send_email(user_id, message), - name="send_email" - ) - - sms_sent = ctx.step( - lambda _: send_sms(user_id, message), - name="send_sms" - ) - - push_sent = ctx.step( - lambda _: send_push_notification(user_id, message), - name="send_push" - ) - - return { - "email": email_sent, - "sms": sms_sent, - "push": push_sent, - } - -@durable_execution -def order_confirmation_handler(event: dict, context: DurableContext) -> dict: - """Send order confirmation notifications.""" - notifications = context.run_in_child_context( - send_notifications( - event["user_id"], - f"Order {event['order_id']} confirmed" - ), - name="order_notifications" - ) - - return {"notifications_sent": notifications} - -@durable_execution -def shipment_handler(event: dict, context: DurableContext) -> dict: - """Send shipment notifications.""" - notifications = context.run_in_child_context( - send_notifications( - event["user_id"], - f"Order {event['order_id']} shipped" - ), - name="shipment_notifications" - ) - - return {"notifications_sent": notifications} -``` - -[↑ Back to top](#table-of-contents) - -## Advanced patterns - -### Conditional child contexts - -Execute child contexts based on conditions: - -```python -@durable_with_child_context -def standard_processing(ctx: DurableContext, data: dict) -> dict: - """Standard data processing.""" - result = ctx.step(lambda _: process_standard(data), name="process") - return {"type": "standard", "result": result} - -@durable_with_child_context -def premium_processing(ctx: DurableContext, data: dict) -> dict: - """Premium data processing with extra steps.""" - enhanced = ctx.step(lambda _: enhance_data(data), name="enhance") - validated = ctx.step(lambda _: validate_premium(enhanced), name="validate") - result = ctx.step(lambda _: process_premium(validated), name="process") - return {"type": "premium", "result": result} - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Process data based on customer tier.""" - customer_tier = event.get("tier", "standard") - - if customer_tier == "premium": - result = context.run_in_child_context( - premium_processing(event["data"]), - name="premium_processing" - ) - else: - result = context.run_in_child_context( - standard_processing(event["data"]), - name="standard_processing" - ) - - return result -``` - -### Error handling in child contexts - -Handle errors within child contexts: - -```python -@durable_with_child_context -def risky_operation(ctx: DurableContext, data: dict) -> dict: - """Operation that might fail.""" - try: - result = ctx.step( - lambda _: potentially_failing_operation(data), - name="risky_step" - ) - return {"status": "success", "result": result} - except Exception as e: - # Handle error within child context - fallback = ctx.step( - lambda _: fallback_operation(data), - name="fallback" - ) - return {"status": "fallback", "result": fallback, "error": str(e)} - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Handle errors in child context.""" - result = context.run_in_child_context( - risky_operation(event["data"]), - name="risky_operation" - ) - - if result["status"] == "fallback": - # Log or handle fallback scenario - return {"warning": "Used fallback", "result": result["result"]} - - return result -``` - -### Sequential child contexts - -Execute multiple child contexts sequentially: - -```python -@durable_with_child_context -def process_region_a(ctx: DurableContext, data: dict) -> dict: - """Process data for region A.""" - result = ctx.step(lambda _: process_for_region("A", data), name="process_a") - return {"region": "A", "result": result} - -@durable_with_child_context -def process_region_b(ctx: DurableContext, data: dict) -> dict: - """Process data for region B.""" - result = ctx.step(lambda _: process_for_region("B", data), name="process_b") - return {"region": "B", "result": result} - -@durable_with_child_context -def process_region_c(ctx: DurableContext, data: dict) -> dict: - """Process data for region C.""" - result = ctx.step(lambda _: process_for_region("C", data), name="process_c") - return {"region": "C", "result": result} - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Process data for multiple regions sequentially.""" - data = event["data"] - - # Execute child contexts sequentially - result_a = context.run_in_child_context( - process_region_a(data), - name="region_a" - ) - - result_b = context.run_in_child_context( - process_region_b(data), - name="region_b" - ) - - result_c = context.run_in_child_context( - process_region_c(data), - name="region_c" - ) - - return { - "regions_processed": 3, - "results": [result_a, result_b, result_c], - } -``` - -For parallel execution, use `context.parallel()` instead. See [Parallel operations](parallel.md) for details. - -[↑ Back to top](#table-of-contents) - -## Best practices - -**Use child contexts for logical grouping** - Group related operations together in a child context to improve code organization and readability. - -**Name contexts descriptively** - Use clear names that explain what the context does. This helps with debugging and testing. - -**Keep context functions focused** - Each context function should have a single, well-defined purpose. Don't create overly complex context functions. - -**Use child contexts for large data** - When processing data that exceeds step size limits, break it into multiple steps within a child context. - -**Create reusable components** - Design context functions that can be reused across different workflows. - -**Handle errors appropriately** - Decide whether to handle errors within the child context or let them propagate to the parent. - -**Pass data through parameters** - Pass data to child contexts through function parameters, not global variables. - -**Document context functions** - Add docstrings explaining what the context does and what it returns. - -**Test context functions independently** - Write tests for individual context functions to ensure they work correctly in isolation. - -[↑ Back to top](#table-of-contents) - -## FAQ - -**Q: What's the difference between a child context and a step?** - -A: A step is a single operation that checkpoints its result. A child context is a collection of operations (steps, waits, callbacks, etc.) that execute in an isolated scope. The entire child context result is checkpointed as a single unit in the parent context. - -**Q: Can I use steps inside child contexts?** - -A: Yes, child contexts can contain any durable operations: steps, waits, and callbacks. - -**Q: When should I use a child context vs multiple steps?** - -A: Use child contexts when you want to: -- Group related operations logically -- Create reusable workflow components -- Handle data larger than step size limits -- Isolate operations from the parent context - -Use multiple steps when operations are independent and don't need isolation. - -**Q: Can child contexts access the parent context?** - -A: No, child contexts receive their own `DurableContext` instance. They can't access the parent context directly. Pass data through function parameters. - -**Q: What happens if a child context fails?** - -A: If an operation within a child context raises an exception, the exception propagates to the parent context unless you handle it within the child context. - -**Q: Can I create multiple child contexts in one function?** - -A: Yes, you can create as many child contexts as needed. They execute sequentially by default. For parallel execution, use `context.parallel()` instead. - -**Q: Can I use callbacks in child contexts?** - -A: Yes, child contexts support all durable operations including callbacks, waits, and steps. - -**Q: Can I pass large data to child contexts?** - -A: Yes, but be mindful of Lambda payload limits. If data is very large, consider storing it externally (S3, DynamoDB) and passing references. - -**Q: Do child contexts share the same logger?** - -A: Yes, the logger is inherited from the parent context, but you can access it through the child context's `ctx.logger`. - -[↑ Back to top](#table-of-contents) - -## Testing - -You can test child contexts using the testing SDK. The test runner executes your function and lets you inspect child context results. - -### Basic child context testing - -```python -import pytest -from aws_durable_execution_sdk_python_testing import InvocationStatus -from examples.src.run_in_child_context import run_in_child_context - -@pytest.mark.durable_execution( - handler=run_in_child_context.handler, - lambda_function_name="run in child context", -) -def test_run_in_child_context(durable_runner): - """Test basic child context execution.""" - with durable_runner: - result = durable_runner.run(input="test", timeout=10) - - # Check overall status - assert result.status is InvocationStatus.SUCCEEDED - assert result.result == "Child context result: 10" -``` - -### Inspecting child context operations - -Use `result.get_context()` to inspect child context results: - -```python -@pytest.mark.durable_execution( - handler=run_in_child_context.handler, - lambda_function_name="run in child context", -) -def test_child_context_operations(durable_runner): - """Test and inspect child context operations.""" - with durable_runner: - result = durable_runner.run(input="test", timeout=10) - - # Verify child context operation exists - context_ops = [ - op for op in result.operations - if op.operation_type.value == "CONTEXT" - ] - assert len(context_ops) >= 1 - - # Get child context by name (if named) - child_result = result.get_context("child_operation") - assert child_result is not None -``` - -### Testing large data handling - -Test that child contexts handle large data correctly: - -```python -from examples.src.run_in_child_context import run_in_child_context_large_data - -@pytest.mark.durable_execution( - handler=run_in_child_context_large_data.handler, - lambda_function_name="run in child context large data", -) -def test_large_data_processing(durable_runner): - """Test large data handling with child context.""" - with durable_runner: - result = durable_runner.run(input=None, timeout=30) - - result_data = result.result - - # Verify execution succeeded - assert result.status is InvocationStatus.SUCCEEDED - assert result_data["success"] is True - - # Verify large data was processed - assert result_data["summary"]["totalDataSize"] > 240 # ~250KB - assert result_data["summary"]["stepsExecuted"] == 5 - - # Verify data integrity across wait - assert result_data["dataIntegrityCheck"] is True -``` - - - -### Testing error handling - -Test that child contexts handle errors correctly: - -```python -@pytest.mark.durable_execution( - handler=error_handling_handler, - lambda_function_name="error_handling", -) -def test_child_context_error_handling(durable_runner): - """Test error handling in child context.""" - with durable_runner: - result = durable_runner.run(input={"data": "invalid"}, timeout=10) - - # Function should handle error gracefully - assert result.status is InvocationStatus.SUCCEEDED - assert result.result["status"] == "fallback" - assert "error" in result.result -``` - -For more testing patterns, see: -- [Basic tests](../testing-patterns/basic-tests.md) - Simple test examples -- [Complex workflows](../testing-patterns/complex-workflows.md) - Multi-step workflow testing -- [Best practices](../testing-patterns/best-practices.md) - Testing recommendations - -[↑ Back to top](#table-of-contents) - -## See also - -- [DurableContext API](../api-reference/context.md) - Complete context reference -- [Steps](steps.md) - Use steps within child contexts -- [Wait operations](wait.md) - Use waits within child contexts -- [Callbacks](callbacks.md) - Use callbacks within child contexts -- [Parallel operations](parallel.md) - Execute child contexts in parallel -- [Examples](https://github.com/awslabs/aws-durable-execution-sdk-python/tree/main/examples/src/run_in_child_context) - More child context examples - -[↑ Back to top](#table-of-contents) - -## License - -See the LICENSE file for our project's licensing. - -[↑ Back to top](#table-of-contents) diff --git a/docs/core/invoke.md b/docs/core/invoke.md deleted file mode 100644 index a6bac65a..00000000 --- a/docs/core/invoke.md +++ /dev/null @@ -1,774 +0,0 @@ -# Invoke Operations - -## Table of Contents - -- [What are invoke operations?](#what-are-invoke-operations) -- [Terminology](#terminology) -- [Key features](#key-features) -- [Getting started](#getting-started) -- [Method signature](#method-signature) -- [Function composition patterns](#function-composition-patterns) -- [Configuration](#configuration) -- [Error handling](#error-handling) -- [Advanced patterns](#advanced-patterns) -- [Best practices](#best-practices) -- [FAQ](#faq) -- [Testing](#testing) -- [See also](#see-also) - -[← Back to main index](../index.md) - -## Terminology - -**Invoke operation** - A durable operation that calls another durable function and waits for its result. Created using `context.invoke()`. - -**Chained invocation** - The process of one durable function calling another durable function. The calling function suspends while the invoked function executes. - -**Function composition** - Building complex workflows by combining multiple durable functions, where each function handles a specific part of the overall process. - -**Payload** - The input data sent to the invoked function. Can be any JSON-serializable value or use custom serialization. - -**Timeout** - The maximum time to wait for an invoked function to complete. If exceeded, the invoke operation fails with a timeout error. - -[↑ Back to top](#table-of-contents) - -## What are invoke operations? - -Invoke operations let you call other Lambda functions from within your durable function. You can invoke both durable functions and regular on-demand Lambda functions. This enables function composition, where you break complex workflows into smaller, reusable functions. The calling function suspends while the invoked function executes, and resumes when the result is available. - -Use invoke operations to: -- Modularize complex workflows into manageable functions -- Call existing Lambda functions (durable or on-demand) from your workflow -- Isolate different parts of your business logic -- Build hierarchical execution patterns -- Coordinate multiple Lambda functions durably -- Integrate with existing Lambda-based services - -When you invoke a function, the SDK: -1. Checkpoints the invoke operation -2. Triggers the target function asynchronously -3. Suspends the calling function -4. Resumes the calling function when the result is ready -5. Returns the result or propagates any errors - -[↑ Back to top](#table-of-contents) - -## Key features - -- **Automatic checkpointing** - Invoke operations are checkpointed before execution -- **Asynchronous execution** - Invoked functions run independently without blocking resources -- **Result handling** - Results are automatically deserialized and returned -- **Error propagation** - Errors from invoked functions propagate to the caller -- **Timeout support** - Configure maximum wait time for invoked functions -- **Custom serialization** - Control how payloads and results are serialized -- **Named operations** - Identify invoke operations by name for debugging - -[↑ Back to top](#table-of-contents) - -## Getting started - -Here's a simple example of invoking another durable function: - -```python -from aws_durable_execution_sdk_python import ( - DurableContext, - durable_execution, -) - -@durable_execution -def process_order(event: dict, context: DurableContext) -> dict: - """Process an order by validating and charging.""" - order_id = event["order_id"] - amount = event["amount"] - - # Invoke validation function - validation_result = context.invoke( - function_name="validate-order", - payload={"order_id": order_id}, - name="validate_order", - ) - - if not validation_result["valid"]: - return {"status": "rejected", "reason": validation_result["reason"]} - - # Invoke payment function - payment_result = context.invoke( - function_name="process-payment", - payload={"order_id": order_id, "amount": amount}, - name="process_payment", - ) - - return { - "status": "completed", - "order_id": order_id, - "transaction_id": payment_result["transaction_id"], - } -``` - -When this function runs: -1. It invokes the `validate-order` function and waits for the result -2. If validation succeeds, it invokes the `process-payment` function -3. Each invoke operation is checkpointed automatically -4. If the function is interrupted, it resumes from the last completed invoke - -[↑ Back to top](#table-of-contents) - -## Method signature - -### context.invoke() - -```python -def invoke( - function_name: str, - payload: P, - name: str | None = None, - config: InvokeConfig[P, R] | None = None, -) -> R -``` - -**Parameters:** - -- `function_name` - The name of the Lambda function to invoke. This should be the function name, not the ARN. -- `payload` - The input data to send to the invoked function. Can be any JSON-serializable value. -- `name` (optional) - A name for the invoke operation, useful for debugging and testing. -- `config` (optional) - An `InvokeConfig` object to configure timeout and serialization. - -**Returns:** The result returned by the invoked function. - -**Raises:** -- `CallableRuntimeError` - If the invoked function fails or times out - -[↑ Back to top](#table-of-contents) - -## Function composition patterns - -### Sequential invocations - -Call multiple functions in sequence, where each depends on the previous result: - -```python -@durable_execution -def orchestrate_workflow(event: dict, context: DurableContext) -> dict: - """Orchestrate a multi-step workflow.""" - user_id = event["user_id"] - - # Step 1: Fetch user data - user = context.invoke( - function_name="fetch-user", - payload={"user_id": user_id}, - name="fetch_user", - ) - - # Step 2: Enrich user data - enriched_user = context.invoke( - function_name="enrich-user-data", - payload=user, - name="enrich_user", - ) - - # Step 3: Generate report - report = context.invoke( - function_name="generate-report", - payload=enriched_user, - name="generate_report", - ) - - return report -``` - -### Conditional invocations - -Invoke different functions based on conditions: - -```python -@durable_execution -def process_document(event: dict, context: DurableContext) -> dict: - """Process a document based on its type.""" - document_type = event["document_type"] - document_data = event["data"] - - if document_type == "pdf": - result = context.invoke( - function_name="process-pdf", - payload=document_data, - name="process_pdf", - ) - elif document_type == "image": - result = context.invoke( - function_name="process-image", - payload=document_data, - name="process_image", - ) - else: - result = context.invoke( - function_name="process-generic", - payload=document_data, - name="process_generic", - ) - - return result -``` - -### Hierarchical workflows - -Build hierarchical workflows where parent functions coordinate child functions: - -```python -@durable_execution -def parent_workflow(event: dict, context: DurableContext) -> dict: - """Parent workflow that coordinates sub-workflows.""" - project_id = event["project_id"] - - # Invoke sub-workflow for data collection - data = context.invoke( - function_name="collect-data-workflow", - payload={"project_id": project_id}, - name="collect_data", - ) - - # Invoke sub-workflow for data processing - processed = context.invoke( - function_name="process-data-workflow", - payload=data, - name="process_data", - ) - - # Invoke sub-workflow for reporting - report = context.invoke( - function_name="generate-report-workflow", - payload=processed, - name="generate_report", - ) - - return report -``` - -### Invoking on-demand functions - -You can invoke regular Lambda functions (non-durable) from your durable workflow: - -```python -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Invoke a mix of durable and on-demand functions.""" - user_id = event["user_id"] - - # Invoke a regular Lambda function for data fetching - user_data = context.invoke( - function_name="fetch-user-data", # Regular Lambda function - payload={"user_id": user_id}, - name="fetch_user", - ) - - # Invoke a durable function for complex processing - processed = context.invoke( - function_name="process-user-workflow", # Durable function - payload=user_data, - name="process_user", - ) - - # Invoke another regular Lambda for notifications - notification = context.invoke( - function_name="send-notification", # Regular Lambda function - payload={"user_id": user_id, "data": processed}, - name="send_notification", - ) - - return { - "status": "completed", - "notification_sent": notification["sent"], - } -``` - -[↑ Back to top](#table-of-contents) - -## Configuration - -Configure invoke behavior using `InvokeConfig`: - -```python -from aws_durable_execution_sdk_python import ( - DurableContext, - durable_execution, -) -from aws_durable_execution_sdk_python.config import Duration, InvokeConfig - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - # Configure invoke with timeout - invoke_config = InvokeConfig( - timeout=Duration.from_minutes(5), - ) - - result = context.invoke( - function_name="long-running-function", - payload=event, - name="long_running", - config=invoke_config, - ) - - return result -``` - -### InvokeConfig parameters - -**timeout** - Maximum duration to wait for the invoked function to complete. Default is no timeout. Use this to prevent long-running invocations from blocking execution indefinitely. - -**serdes_payload** - Custom serialization/deserialization for the payload sent to the invoked function. If None, uses default JSON serialization. - -**serdes_result** - Custom serialization/deserialization for the result returned from the invoked function. If None, uses default JSON serialization. - -**tenant_id** - Optional tenant identifier for multi-tenant isolation. If provided, the invocation will be scoped to this tenant. - -### Setting timeouts - -Use the `Duration` class to set timeouts: - -```python -from aws_durable_execution_sdk_python.config import Duration, InvokeConfig - -# Timeout after 30 seconds -config = InvokeConfig(timeout=Duration.from_seconds(30)) - -# Timeout after 5 minutes -config = InvokeConfig(timeout=Duration.from_minutes(5)) - -# Timeout after 2 hours -config = InvokeConfig(timeout=Duration.from_hours(2)) -``` - -[↑ Back to top](#table-of-contents) - -## Error handling - -### Handling invocation errors - -Errors from invoked functions propagate to the calling function. Catch and handle them as needed: - -```python -from aws_durable_execution_sdk_python import ( - DurableContext, - durable_execution, - CallableRuntimeError, -) - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Handle errors from invoked functions.""" - try: - result = context.invoke( - function_name="risky-function", - payload=event, - name="risky_operation", - ) - return {"status": "success", "result": result} - - except CallableRuntimeError as e: - # Handle the error from the invoked function - context.logger.error(f"Invoked function failed: {e}") - return { - "status": "failed", - "error": str(e), - } -``` - -### Timeout handling - -Handle timeout errors specifically: - -```python -from aws_durable_execution_sdk_python.config import Duration, InvokeConfig - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Handle timeout errors.""" - config = InvokeConfig(timeout=Duration.from_seconds(30)) - - try: - result = context.invoke( - function_name="slow-function", - payload=event, - config=config, - ) - return {"status": "success", "result": result} - - except CallableRuntimeError as e: - if "timed out" in str(e).lower(): - context.logger.warning("Function timed out, using fallback") - return {"status": "timeout", "fallback": True} - raise -``` - -### Retry patterns - -Implement retry logic for failed invocations: - -```python -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Retry failed invocations.""" - max_retries = 3 - - for attempt in range(max_retries): - try: - result = context.invoke( - function_name="unreliable-function", - payload=event, - name=f"attempt_{attempt + 1}", - ) - return {"status": "success", "result": result, "attempts": attempt + 1} - - except CallableRuntimeError as e: - if attempt == max_retries - 1: - # Last attempt failed - return { - "status": "failed", - "error": str(e), - "attempts": max_retries, - } - # Wait before retrying - context.wait(Duration.from_seconds(2 ** attempt)) - - return {"status": "failed", "reason": "max_retries_exceeded"} -``` - -[↑ Back to top](#table-of-contents) - -## Advanced patterns - -### Custom serialization - -Use custom serialization for complex data types: - -```python -from aws_durable_execution_sdk_python.config import InvokeConfig -from aws_durable_execution_sdk_python.serdes import SerDes - -class CustomSerDes(SerDes): - """Custom serialization for complex objects.""" - - def serialize(self, value): - # Custom serialization logic - return json.dumps({"custom": value}) - - def deserialize(self, data: str): - # Custom deserialization logic - obj = json.loads(data) - return obj["custom"] - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Use custom serialization.""" - config = InvokeConfig( - serdes_payload=CustomSerDes(), - serdes_result=CustomSerDes(), - ) - - result = context.invoke( - function_name="custom-function", - payload={"complex": "data"}, - config=config, - ) - - return result -``` - -### Fan-out pattern with parallel invocations - -Invoke multiple functions in parallel using steps: - -```python -from aws_durable_execution_sdk_python import durable_step, StepContext - -@durable_step -def invoke_service(step_context: StepContext, service_name: str, data: dict) -> dict: - """Invoke a service and return its result.""" - # Note: This is a simplified example. In practice, you'd need access to context - # which isn't directly available in step functions. - return {"service": service_name, "result": data} - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Fan out to multiple services.""" - services = ["service-a", "service-b", "service-c"] - - # Invoke each service sequentially - results = [] - for service in services: - result = context.invoke( - function_name=service, - payload=event, - name=f"invoke_{service}", - ) - results.append(result) - - return {"results": results} -``` - -### Passing context between invocations - -Pass data between invoked functions: - -```python -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Pass context between invocations.""" - # First invocation creates context - initial_context = context.invoke( - function_name="initialize-context", - payload=event, - name="initialize", - ) - - # Second invocation uses the context - processed = context.invoke( - function_name="process-with-context", - payload={ - "data": event["data"], - "context": initial_context, - }, - name="process", - ) - - # Third invocation finalizes - final_result = context.invoke( - function_name="finalize", - payload={ - "processed": processed, - "context": initial_context, - }, - name="finalize", - ) - - return final_result -``` - -[↑ Back to top](#table-of-contents) - -## Best practices - -**Use descriptive function names** - Choose clear, descriptive names for the functions you invoke to make workflows easier to understand. - -**Name invoke operations** - Use the `name` parameter to identify invoke operations in logs and tests. - -**Set appropriate timeouts** - Configure timeouts based on expected execution time. Don't set them too short or too long. - -**Handle errors explicitly** - Catch and handle errors from invoked functions. Don't let them propagate unexpectedly. - -**Keep payloads small** - Large payloads increase serialization overhead. Consider passing references instead of large data. - -**Design for idempotency** - Invoked functions should be idempotent since they might be retried. - -**Use hierarchical composition** - Break complex workflows into layers of functions, where each layer handles a specific level of abstraction. - -**Avoid deep nesting** - Don't create deeply nested invocation chains. Keep hierarchies shallow for better observability. - -**Log invocation boundaries** - Log when invoking functions and when receiving results for better debugging. - -**Consider cost implications** - Each invoke operation triggers a separate Lambda invocation, which has cost implications. - -**Mix durable and on-demand functions** - You can invoke both durable and regular Lambda functions. The orchestrator can be durable and compose regular on-demand functions. The orchestrator provides durability for the results of the invoked on-demand functions without needing to provide durability on the invoked functions themselves. Use durable functions for complex workflows and on-demand functions for simple operations. - -[↑ Back to top](#table-of-contents) - -## FAQ - -**Q: What's the difference between invoke and step?** - -A: `invoke()` calls another durable function (Lambda), while `step()` executes code within the current function. Use invoke for function composition, use step for checkpointing operations within a function. - -**Q: Can I invoke non-durable functions?** - -A: Yes, `context.invoke()` can call both durable functions and regular on-demand Lambda functions. The invoke operation works with any Lambda function that accepts and returns JSON-serializable data. - - -**Q: How do I pass the result from one invoke to another?** - -A: Simply use the return value. The type of the return value is governed by the `serdes_result` configuration: - -```python -result1 = context.invoke("function-1", payload1) -result2 = context.invoke("function-2", result1) -``` - -**Q: What happens if an invoked function fails?** - -A: The error propagates to the calling function as a `CallableRuntimeError`. You can catch and handle it. - -**Q: Can I invoke the same function multiple times?** - -A: Yes, you can invoke the same function multiple times with different payloads or names. - -**Q: How do I invoke a function in a different AWS account?** - -A: The `function_name` parameter accepts function names in the same account. For cross-account invocations, you need appropriate IAM permissions and may need to use function ARNs (check AWS documentation for cross-account Lambda invocations). - -**Q: What's the maximum timeout I can set?** - -A: The timeout is limited by Lambda's maximum execution time (15 minutes). However, durable functions can run longer by suspending and resuming. - -**Q: Can I invoke functions in parallel?** - -A: Not directly with `context.invoke()`. For parallel execution, consider using `context.parallel()` with steps that perform invocations, or invoke multiple functions sequentially. - -**Q: How do I debug invoke operations?** - -A: Use the `name` parameter to identify operations in logs. Check CloudWatch logs for both the calling and invoked functions. - -**Q: What happens if I don't set a timeout?** - -A: The invoke operation waits indefinitely for the invoked function to complete. It's recommended to set timeouts for better error handling. - -**Q: What's the difference between context.invoke() and using boto3's Lambda client to invoke functions?** - -A: When you use `context.invoke()`, the SDK suspends your durable function's execution while waiting for the result. This means you don't pay for Lambda compute time while waiting. With boto3's Lambda client, your function stays active and consumes billable compute time while waiting for the response. Additionally, `context.invoke()` automatically checkpoints the operation, handles errors durably, and integrates with the durable execution lifecycle. - -[↑ Back to top](#table-of-contents) - -## Testing - -You can test invoke operations using the testing SDK. The test runner executes your function and lets you inspect invoke operations. - -### Basic invoke testing - -```python -import pytest -from aws_durable_execution_sdk_python_testing import InvocationStatus -from my_function import handler - -@pytest.mark.durable_execution( - handler=handler, - lambda_function_name="my_function", -) -def test_invoke(durable_runner): - """Test a function with invoke operations.""" - with durable_runner: - result = durable_runner.run( - input={"order_id": "order-123", "amount": 100.0}, - timeout=30, - ) - - # Check overall status - assert result.status is InvocationStatus.SUCCEEDED - - # Check final result - assert result.result["status"] == "completed" -``` - -### Inspecting invoke operations - -Use the result object to inspect invoke operations: - -```python -@pytest.mark.durable_execution( - handler=handler, - lambda_function_name="my_function", -) -def test_invoke_operations(durable_runner): - """Test and inspect invoke operations.""" - with durable_runner: - result = durable_runner.run(input={"user_id": "user-123"}, timeout=30) - - # Get all operations - operations = result.operations - - # Find invoke operations - invoke_ops = [op for op in operations if op.operation_type == "CHAINED_INVOKE"] - - # Verify invoke operations were created - assert len(invoke_ops) == 2 - - # Check specific invoke operation - validate_op = next(op for op in invoke_ops if op.name == "validate_order") - assert validate_op.status is InvocationStatus.SUCCEEDED -``` - -### Testing error handling - -Test that invoke errors are handled correctly: - -```python -@pytest.mark.durable_execution( - handler=handler_with_error_handling, - lambda_function_name="error_handler_function", -) -def test_invoke_error_handling(durable_runner): - """Test invoke error handling.""" - with durable_runner: - result = durable_runner.run(input={}, timeout=30) - - # Function should handle the error gracefully - assert result.status is InvocationStatus.SUCCEEDED - assert result.result["status"] == "failed" - assert "error" in result.result -``` - -### Testing timeouts - -Test that timeouts are handled correctly: - -```python -from aws_durable_execution_sdk_python.config import Duration, InvokeConfig - -@pytest.mark.durable_execution( - handler=handler_with_timeout, - lambda_function_name="timeout_function", -) -def test_invoke_timeout(durable_runner): - """Test invoke timeout handling.""" - with durable_runner: - result = durable_runner.run(input={}, timeout=60) - - # Check that timeout was handled - assert result.status is InvocationStatus.SUCCEEDED - assert result.result["status"] == "timeout" -``` - -### Mocking invoked functions - -When testing, you can mock the invoked functions to control their behavior: - -```python -from unittest.mock import Mock, patch - -@pytest.mark.durable_execution( - handler=handler, - lambda_function_name="my_function", -) -def test_invoke_with_mock(durable_runner): - """Test invoke with mocked function.""" - # The testing framework handles invocations internally - # You can test the orchestration logic without deploying all functions - - with durable_runner: - result = durable_runner.run( - input={"order_id": "order-123"}, - timeout=30, - ) - - # Verify the orchestration logic - assert result.status is InvocationStatus.SUCCEEDED -``` - -For more testing patterns, see: -- [Basic tests](../testing-patterns/basic-tests.md) - Simple test examples -- [Complex workflows](../testing-patterns/complex-workflows.md) - Multi-step workflow testing -- [Best practices](../testing-patterns/best-practices.md) - Testing recommendations - -[↑ Back to top](#table-of-contents) - -## See also - -- [Steps](steps.md) - Execute code with checkpointing -- [Child contexts](child-contexts.md) - Organize operations hierarchically -- [Parallel operations](parallel.md) - Execute multiple operations concurrently -- [Error handling](../advanced/error-handling.md) - Handle errors in durable functions -- [DurableContext API](../api-reference/context.md) - Complete context reference - -[↑ Back to top](#table-of-contents) - -## License - -See the [LICENSE](../../LICENSE) file for our project's licensing. - -[↑ Back to top](#table-of-contents) diff --git a/docs/core/logger.md b/docs/core/logger.md deleted file mode 100644 index 6236a057..00000000 --- a/docs/core/logger.md +++ /dev/null @@ -1,737 +0,0 @@ -# Logger integration - -The Durable Execution SDK automatically enriches your logs with execution context, making it easy to trace operations across checkpoints and replays. You can use the built-in logger or integrate with Powertools for AWS Lambda (Python) for advanced structured logging. - -## Table of contents - -- [Key features](#key-features) -- [Terminology](#terminology) -- [Getting started](#getting-started) -- [Method signature](#method-signature) -- [Automatic context enrichment](#automatic-context-enrichment) -- [Adding custom metadata](#adding-custom-metadata) -- [Logger inheritance in child contexts](#logger-inheritance-in-child-contexts) -- [Integration with Powertools for AWS Lambda (Python)](#integration-with-powertools-for-aws-lambda-python) -- [Replay behavior and log deduplication](#replay-behavior-and-log-deduplication) -- [Best practices](#best-practices) -- [Enabling debug logging](#enabling-debug-logging) -- [FAQ](#faq) -- [Testing logger integration](#testing-logger-integration) -- [See also](#see-also) - -[← Back to main index](../index.md) - -## Key features - -- Automatic log deduplication during replays - logs from completed operations don't repeat -- Automatic enrichment with execution context (execution ARN, parent ID, operation name, attempt number) -- Logger inheritance in child contexts for hierarchical tracing -- Compatible with Python's standard logging and Powertools for AWS Lambda (Python) -- Support for custom metadata through the `extra` parameter -- All standard log levels: debug, info, warning, error, exception - -[↑ Back to top](#table-of-contents) - -## Terminology - -**Log deduplication** - The SDK prevents duplicate logs during replays by tracking completed operations. When your function is checkpointed and resumed, logs from already-completed operations aren't emitted again, keeping your CloudWatch logs clean. - -**Context enrichment** - The automatic addition of execution metadata (execution ARN, parent ID, operation name, attempt number) to log entries. The SDK handles this for you, so every log includes tracing information. - -**Logger inheritance** - When you create a child context, it inherits the parent's logger and adds its own context information. This creates a hierarchical logging structure that mirrors your execution flow. - -**Extra metadata** - Additional key-value pairs you can add to log entries using the `extra` parameter. These merge with the automatic context enrichment. - -[↑ Back to top](#table-of-contents) - -## Getting started - -Access the logger through `context.logger` in your durable functions: - -```python -from aws_durable_execution_sdk_python import DurableContext, durable_execution - -@durable_execution -def handler(event: dict, context: DurableContext) -> str: - # Log at the top level - context.logger.info("Starting workflow", extra={"event_id": event.get("id")}) - - # Execute a step - result: str = context.step( - lambda _: "processed", - name="process_data", - ) - - context.logger.info("Workflow completed", extra={"result": result}) - return result -``` - -The logger automatically includes execution context in every log entry. - -### Integration with Lambda Advanced Log Controls - -Durable functions work with Lambda's Advanced Log Controls. You can configure your Lambda function to filter logs by level, which helps reduce CloudWatch Logs costs and noise. When you set a log level filter (like INFO or ERROR), logs below that level are automatically ignored. - -For example, if you set your Lambda function's log level to INFO, debug logs won't appear in CloudWatch Logs: - -```python -@durable_execution -def handler(event: dict, context: DurableContext) -> str: - context.logger.debug("This won't appear if log level is INFO or higher") - context.logger.info("This will appear") - - result: str = context.step( - lambda _: "processed", - name="process_data", - ) - - return result -``` - -Learn more about configuring log levels in the [Lambda Advanced Log Controls documentation](https://docs.aws.amazon.com/lambda/latest/dg/monitoring-cloudwatchlogs.html#monitoring-cloudwatchlogs-advanced). - -[↑ Back to top](#table-of-contents) - -## Method signature - -The logger provides standard logging methods: - -```python -context.logger.debug(msg, *args, extra=None) -context.logger.info(msg, *args, extra=None) -context.logger.warning(msg, *args, extra=None) -context.logger.error(msg, *args, extra=None) -context.logger.exception(msg, *args, extra=None) -``` - -**Parameters:** -- `msg` (object) - The log message. Can include format placeholders. -- `*args` (object) - Arguments for message formatting. -- `extra` (dict[str, object] | None) - Optional dictionary of additional fields to include in the log entry. - -[↑ Back to top](#table-of-contents) - -## Automatic context enrichment - -The SDK automatically enriches logs with execution metadata: - -```python -@durable_execution -def handler(event: dict, context: DurableContext) -> str: - # This log includes: execution_arn - context.logger.info("Top-level log") - - result: str = context.step( - lambda _: "processed", - name="process_data", - ) - - # This log includes: execution_arn, parent_id, name, attempt - context.logger.info("Step completed") - - return result -``` - -**Enriched fields:** -- `execution_arn` - Always present, identifies the durable execution -- `parent_id` - Present in child contexts, identifies the parent operation -- `name` - Present when the operation has a name -- `attempt` - Present in steps, shows the retry attempt number - -[↑ Back to top](#table-of-contents) - -## Adding custom metadata - -Use the `extra` parameter to add custom fields to your logs: - -```python -@durable_execution -def handler(event: dict, context: DurableContext) -> str: - order_id = event.get("order_id") - - context.logger.info( - "Processing order", - extra={ - "order_id": order_id, - "customer_id": event.get("customer_id"), - "priority": "high" - } - ) - - result: str = context.step( - lambda _: f"order-{order_id}-processed", - name="process_order", - ) - - context.logger.info( - "Order completed", - extra={"order_id": order_id, "result": result} - ) - - return result -``` - -Custom fields merge with the automatic context enrichment, so your logs include both execution metadata and your custom data. - -[↑ Back to top](#table-of-contents) - -## Logger inheritance in child contexts - -Child contexts inherit the parent's logger and add their own context: - -```python -from aws_durable_execution_sdk_python import ( - DurableContext, - durable_execution, - durable_with_child_context, -) - -@durable_with_child_context -def child_workflow(ctx: DurableContext) -> str: - # Logger includes parent_id for the child context - ctx.logger.info("Running in child context") - - # Step in child context has nested parent_id - child_result: str = ctx.step( - lambda _: "child-processed", - name="child_step", - ) - - ctx.logger.info("Child workflow completed", extra={"result": child_result}) - return child_result - -@durable_execution -def handler(event: dict, context: DurableContext) -> str: - # Top-level logger: only execution_arn - context.logger.info("Starting workflow", extra={"event_id": event.get("id")}) - - # Child context inherits logger and adds its own parent_id - result: str = context.run_in_child_context( - child_workflow(), - name="child_workflow" - ) - - context.logger.info("Workflow completed", extra={"result": result}) - return result -``` - -This creates a hierarchical logging structure where you can trace operations from parent to child contexts. - -[↑ Back to top](#table-of-contents) - -## Integration with Powertools for AWS Lambda (Python) - -The SDK is compatible with Powertools for AWS Lambda (Python), giving you structured logging with JSON output and additional features. - -**Powertools for AWS Lambda (Python) benefits:** -- JSON structured logging for CloudWatch Logs Insights -- Automatic Lambda context injection (request ID, function name, etc.) -- Correlation IDs for distributed tracing -- Log sampling for cost optimization -- Integration with X-Ray tracing - -### Using Powertools for AWS Lambda (Python) directly - -You can use Powertools for AWS Lambda (Python) directly in your durable functions: - -```python -from aws_lambda_powertools import Logger -from aws_durable_execution_sdk_python import DurableContext, durable_execution - -logger = Logger(service="order-processing") - -@durable_execution -def handler(event: dict, context: DurableContext) -> str: - logger.info("Starting workflow") - - result: str = context.step( - lambda _: "processed", - name="process_data", - ) - - logger.info("Workflow completed", extra={"result": result}) - return result -``` - -This gives you all Powertools for AWS Lambda (Python) features like JSON logging and correlation IDs. - -### Integrating with context.logger - -For better integration with durable execution, set Powertools for AWS Lambda (Python) on the context: - -```python -from aws_lambda_powertools import Logger -from aws_durable_execution_sdk_python import DurableContext, durable_execution - -logger = Logger(service="order-processing") - -@durable_execution -def handler(event: dict, context: DurableContext) -> str: - # Set Powertools for AWS Lambda (Python) on the context - context.set_logger(logger) - - # Now context.logger uses Powertools for AWS Lambda (Python) with automatic enrichment - context.logger.info("Starting workflow", extra={"event_id": event.get("id")}) - - result: str = context.step( - lambda _: "processed", - name="process_data", - ) - - context.logger.info("Workflow completed", extra={"result": result}) - return result -``` - -**Benefits of using context.logger:** -- All Powertools for AWS Lambda (Python) features (JSON logging, correlation IDs, etc.) -- Automatic SDK context enrichment (execution_arn, parent_id, name, attempt) -- Log deduplication during replays (see next section) - -The SDK's context enrichment (execution_arn, parent_id, name, attempt) merges with Powertools for AWS Lambda (Python) fields (service, request_id, function_name, etc.) in the JSON output. - -[↑ Back to top](#table-of-contents) - -## Replay behavior and log deduplication - -A critical feature of `context.logger` is that it prevents duplicate logs during replays. When your durable function is checkpointed and resumed, the SDK replays your code to reach the next operation, but logs from completed operations aren't emitted again. - -### How context.logger prevents duplicate logs - -When you use `context.logger`, the SDK tracks which operations have completed and suppresses logs during replay: - -```python -from aws_durable_execution_sdk_python import DurableContext, durable_execution - -@durable_execution -def handler(event: dict, context: DurableContext) -> str: - # This log appears only once, even if the function is replayed - context.logger.info("Starting workflow") - - # Step 1 - logs appear only once - result1: str = context.step( - lambda _: "step1-done", - name="step_1", - ) - context.logger.info("Step 1 completed", extra={"result": result1}) - - # Step 2 - logs appear only once - result2: str = context.step( - lambda _: "step2-done", - name="step_2", - ) - context.logger.info("Step 2 completed", extra={"result": result2}) - - return f"{result1}-{result2}" -``` - -**What happens during replay:** -1. First invocation: All logs appear (starting workflow, step 1 completed, step 2 completed) -2. After checkpoint and resume: Only new logs appear (step 2 completed if step 1 was checkpointed) -3. Your CloudWatch logs show each message only once, making them clean and easy to read - -### Logging behavior with direct logger usage - -When you use a logger directly (not through `context.logger`), logs will be emitted on every replay: - -```python -from aws_lambda_powertools import Logger -from aws_durable_execution_sdk_python import DurableContext, durable_execution - -logger = Logger(service="order-processing") - -@durable_execution -def handler(event: dict, context: DurableContext) -> str: - # This log appears on every replay - logger.info("Starting workflow") - - result1: str = context.step( - lambda _: "step1-done", - name="step_1", - ) - # This log appears on every replay after step 1 - logger.info("Step 1 completed") - - result2: str = context.step( - lambda _: "step2-done", - name="step_2", - ) - # This log appears only once (no more replays after this) - logger.info("Step 2 completed") - - return f"{result1}-{result2}" -``` - -**What happens during replay:** -1. First invocation: All logs appear once -2. After checkpoint and resume: "Starting workflow" and "Step 1 completed" appear again -3. Your CloudWatch logs show duplicate entries for replayed operations - -### Using context.logger with Powertools for AWS Lambda (Python) - -To get both log deduplication and Powertools for AWS Lambda (Python) features, set the Powertools Logger on the context: - -```python -from aws_lambda_powertools import Logger -from aws_durable_execution_sdk_python import DurableContext, durable_execution - -logger = Logger(service="order-processing") - -@durable_execution -def handler(event: dict, context: DurableContext) -> str: - # Set Powertools for AWS Lambda (Python) on the context - context.set_logger(logger) - - # Now you get BOTH: - # - Powertools for AWS Lambda (Python) features (JSON logging, correlation IDs, etc.) - # - Log deduplication during replays - context.logger.info("Starting workflow") - - result1: str = context.step( - lambda _: "step1-done", - name="step_1", - ) - context.logger.info("Step 1 completed", extra={"result": result1}) - - result2: str = context.step( - lambda _: "step2-done", - name="step_2", - ) - context.logger.info("Step 2 completed", extra={"result": result2}) - - return f"{result1}-{result2}" -``` - -**Benefits of this approach:** -- Clean logs without duplicates during replays -- JSON structured logging from Powertools for AWS Lambda (Python) -- Automatic context enrichment from the SDK (execution_arn, parent_id, name, attempt) -- Lambda context injection from Powertools for AWS Lambda (Python) (request_id, function_name, etc.) -- Correlation IDs and X-Ray integration from Powertools for AWS Lambda (Python) - -### When you might see duplicate logs - -You'll still see duplicate logs in these scenarios: -- Logs from operations that fail and retry (this is expected and helpful for debugging) -- Logs outside of durable execution context (before `@durable_execution` decorator runs) -- Logs from code that runs during replay before reaching a checkpoint - -This is normal behavior and helps you understand the execution flow. - -[↑ Back to top](#table-of-contents) - -## Best practices - -**Use structured logging with extra fields** - -Add context-specific data through the `extra` parameter rather than embedding it in the message string: - -```python -# Good - structured and queryable -context.logger.info("Order processed", extra={"order_id": order_id, "amount": 100}) - -# Avoid - harder to query -context.logger.info(f"Order {order_id} processed with amount 100") -``` - -**Log at appropriate levels** - -- `debug` - Detailed diagnostic information for troubleshooting -- `info` - General informational messages about workflow progress -- `warning` - Unexpected situations that don't prevent execution -- `error` - Error conditions that may need attention -- `exception` - Exceptions with stack traces (use in except blocks) - -**Include business context in logs** - -Add identifiers that help you trace business operations: - -```python -context.logger.info( - "Processing payment", - extra={ - "order_id": order_id, - "customer_id": customer_id, - "payment_method": "credit_card" - } -) -``` - -**Use Powertools for AWS Lambda (Python) for production** - -For production workloads, use Powertools for AWS Lambda (Python) to get JSON structured logging and CloudWatch Logs Insights integration: - -```python -from aws_lambda_powertools import Logger - -logger = Logger(service="my-service") - -@durable_execution -def handler(event: dict, context: DurableContext) -> str: - context.set_logger(logger) - # Now you get JSON logs with all Powertools for AWS Lambda (Python) features - context.logger.info("Processing started") -``` - -**Don't log sensitive data** - -Avoid logging sensitive information like passwords, tokens, or personal data: - -```python -# Good - log identifiers only -context.logger.info("User authenticated", extra={"user_id": user_id}) - -# Avoid - don't log sensitive data -context.logger.info("User authenticated", extra={"password": password}) -``` - -[↑ Back to top](#table-of-contents) - -## Enabling debug logging - -The SDK logs internally using Python's standard `logging` module. To see these logs, set `ApplicationLogLevel: DEBUG` in [Advanced logging controls](https://docs.aws.amazon.com/lambda/latest/dg/monitoring-cloudwatchlogs-advanced.html). - -Advanced logging controls filters logs before they reach CloudWatch. If you set DEBUG level in code but leave Advanced logging controls at INFO, your debug logs will be dropped. You must configure the level in Advanced logging controls - it auto-patches all loggers, so you don't need to configure log levels in code. - -```mermaid -flowchart LR - A[Logger emits DEBUG] --> B{Advanced Logging Controls} - B -->|ApplicationLogLevel = DEBUG| C[CloudWatch βœ“] - B -->|ApplicationLogLevel = INFO| D[Dropped βœ—] -``` - -**Important:** DEBUG level applies to all libraries including botocore. Since the SDK uses boto3 internally, this will flood your logs with HTTP request/response details. Silence botocore in your code: - -```python -import logging - -# Silence botocore/urllib3 noise -logging.getLogger("botocore").setLevel(logging.WARNING) -logging.getLogger("urllib3").setLevel(logging.WARNING) -``` - -Configure ALC via SAM/CloudFormation: - -```yaml -# SAM template -Resources: - MyFunction: - Type: AWS::Serverless::Function - Properties: - LoggingConfig: - LogFormat: JSON - ApplicationLogLevel: DEBUG -``` - -Or in the Lambda console under Configuration β†’ Monitoring and operations tools β†’ Logging configuration. - -### Selective logging - -Python loggers are hierarchical. Silencing `aws_durable_execution_sdk_python` silences all SDK modules. To keep some modules at DEBUG while silencing others: - -```python -import logging - -# Silence all SDK logs -logging.getLogger("aws_durable_execution_sdk_python").setLevel(logging.WARNING) - -# Or silence specific modules only -logging.getLogger("aws_durable_execution_sdk_python.state").setLevel(logging.WARNING) -logging.getLogger("aws_durable_execution_sdk_python.concurrency").setLevel(logging.WARNING) -``` - -SDK logger namespaces: - -| Namespace | Description | -|-----------|-------------| -| `aws_durable_execution_sdk_python` | Root - silences all SDK logs | -| `aws_durable_execution_sdk_python.state` | Checkpoint and replay state management | -| `aws_durable_execution_sdk_python.execution` | Durable execution lifecycle | -| `aws_durable_execution_sdk_python.context` | DurableContext operations | -| `aws_durable_execution_sdk_python.lambda_service` | Lambda API calls | -| `aws_durable_execution_sdk_python.serdes` | Serialization/deserialization | -| `aws_durable_execution_sdk_python.concurrency` | Parallel and map execution | -| `aws_durable_execution_sdk_python.operation.step` | Step operations | -| `aws_durable_execution_sdk_python.operation.wait` | Wait operations | -| `aws_durable_execution_sdk_python.operation.invoke` | Invoke operations | -| `aws_durable_execution_sdk_python.operation.child` | Child context operations | -| `aws_durable_execution_sdk_python.operation.parallel` | Parallel operations | -| `aws_durable_execution_sdk_python.operation.map` | Map operations | - -[↑ Back to top](#table-of-contents) - -## FAQ - -**Q: Does logging work during replays?** - -Yes, but `context.logger` prevents duplicate logs. When you use `context.logger`, the SDK tracks completed operations and suppresses their logs during replay. This keeps your CloudWatch logs clean and easy to read. If you use a logger directly (not through `context.logger`), you'll see duplicate log entries on every replay. - -**Q: How do I filter logs by execution?** - -Use the `execution_arn` field that's automatically added to every log entry. In CloudWatch Logs Insights: - -``` -fields @timestamp, @message, execution_arn -| filter execution_arn = "arn:aws:lambda:us-east-1:123456789012:function:my-function:execution-id" -| sort @timestamp asc -``` - -**Q: Can I use a custom logger?** - -Yes. Any logger that implements the `LoggerInterface` protocol works with the SDK. Use `context.set_logger()` to set your custom logger. - -The protocol is defined in `aws_durable_execution_sdk_python.types`: - -```python -from typing import Protocol -from collections.abc import Mapping - -class LoggerInterface(Protocol): - def debug( - self, msg: object, *args: object, extra: Mapping[str, object] | None = None - ) -> None: ... - - def info( - self, msg: object, *args: object, extra: Mapping[str, object] | None = None - ) -> None: ... - - def warning( - self, msg: object, *args: object, extra: Mapping[str, object] | None = None - ) -> None: ... - - def error( - self, msg: object, *args: object, extra: Mapping[str, object] | None = None - ) -> None: ... - - def exception( - self, msg: object, *args: object, extra: Mapping[str, object] | None = None - ) -> None: ... -``` - -Any logger with these methods (like Python's standard `logging.Logger` or Powertools Logger) is compatible. - -**Q: What's the difference between the SDK logger and Powertools for AWS Lambda (Python)?** - -The SDK provides a logger wrapper that adds execution context. Powertools for AWS Lambda (Python) provides structured JSON logging and Lambda-specific features. You can use them together - set the Powertools Logger on the context, and the SDK will enrich it with execution metadata. - -**Q: Do child contexts get their own logger?** - -Child contexts inherit the parent's logger and add their own `parent_id` to the context. This creates a hierarchical logging structure where you can trace operations from parent to child. - -**Q: How do I change the log level?** - -If using Python's standard logging, configure it before your handler: - -```python -import logging -logging.basicConfig(level=logging.DEBUG) -``` - -If using Powertools for AWS Lambda (Python), set the level when creating the logger: - -```python -from aws_lambda_powertools import Logger -logger = Logger(service="my-service", level="DEBUG") -``` - -**Q: Can I access the underlying logger?** - -Yes. Use `context.logger.get_logger()` to access the underlying logger instance if you need to call methods not in the `LoggerInterface`. - -[↑ Back to top](#table-of-contents) - -## Testing logger integration - -You can verify that your durable functions log correctly by capturing log output in tests. - -### Example test - -```python -import pytest -from aws_durable_execution_sdk_python.execution import InvocationStatus - -from src.logger_example import logger_example -from test.conftest import deserialize_operation_payload - -@pytest.mark.durable_execution( - handler=logger_example.handler, - lambda_function_name="logger example", -) -def test_logger_example(durable_runner): - """Test logger example.""" - with durable_runner: - result = durable_runner.run(input={"id": "test-123"}, timeout=10) - - assert result.status is InvocationStatus.SUCCEEDED - assert deserialize_operation_payload(result.result) == "processed-child-processed" -``` - -### Verifying log output - -To verify specific log messages, capture log output using Python's logging test utilities: - -```python -import logging -import pytest -from aws_durable_execution_sdk_python.execution import InvocationStatus - -@pytest.mark.durable_execution(handler=my_handler) -def test_logging_output(durable_runner, caplog): - """Test that expected log messages are emitted.""" - with caplog.at_level(logging.INFO): - with durable_runner: - result = durable_runner.run(input={"id": "test-123"}, timeout=10) - - assert result.status is InvocationStatus.SUCCEEDED - - # Verify log messages - assert "Starting workflow" in caplog.text - assert "Workflow completed" in caplog.text -``` - -### Testing with Powertools for AWS Lambda (Python) - -When testing with Powertools for AWS Lambda (Python), you can verify structured log output: - -```python -import json -import pytest -from aws_lambda_powertools import Logger - -@pytest.mark.durable_execution(handler=my_handler) -def test_powertools_logging(durable_runner, caplog): - """Test Powertools for AWS Lambda (Python) integration.""" - logger = Logger(service="test-service") - - with caplog.at_level(logging.INFO): - with durable_runner: - result = durable_runner.run(input={"id": "test-123"}, timeout=10) - - # Parse JSON log entries - for record in caplog.records: - if hasattr(record, 'msg'): - try: - log_entry = json.loads(record.msg) - # Verify Powertools for AWS Lambda (Python) fields - assert "service" in log_entry - # Verify SDK enrichment fields - assert "execution_arn" in log_entry - except json.JSONDecodeError: - pass # Not a JSON log entry -``` - -[↑ Back to top](#table-of-contents) - -## See also - -- [Steps](steps.md) - Learn about step operations that use logger enrichment -- [Child contexts](child-contexts.md) - Understand logger inheritance in nested contexts -- [Getting started](../getting-started.md) - Basic durable function setup -- [Powertools for AWS Lambda (Python) - Logger](https://docs.powertools.aws.dev/lambda/python/latest/core/logger/) - Powertools Logger documentation - -[↑ Back to top](#table-of-contents) - -## License - -See the LICENSE file for our project's licensing. - -[↑ Back to top](#table-of-contents) diff --git a/docs/core/map.md b/docs/core/map.md deleted file mode 100644 index 2e671192..00000000 --- a/docs/core/map.md +++ /dev/null @@ -1,589 +0,0 @@ -# Map Operations - -## Table of Contents - -- [What are map operations?](#what-are-map-operations) -- [Terminology](#terminology) -- [Key features](#key-features) -- [Getting started](#getting-started) -- [Method signature](#method-signature) -- [Map function signature](#map-function-signature) -- [Configuration](#configuration) -- [Advanced patterns](#advanced-patterns) -- [Best practices](#best-practices) -- [Performance tips](#performance-tips) -- [FAQ](#faq) -- [Testing](#testing) -- [See also](#see-also) - -[← Back to main index](../index.md) - -## Terminology - -**Map operation** - A durable operation that processes a collection of items in parallel, where each item is processed independently and checkpointed. Created using `context.map()`. - -**Map function** - A function that processes a single item from the collection. Receives the context, item, index, and full collection as parameters. - -**BatchResult** - The result type returned by map operations, containing results from all processed items with success/failure status. - -**Concurrency control** - Limiting how many items process simultaneously using `max_concurrency` in `MapConfig`. - -**Completion criteria** - Rules that determine when a map operation succeeds or fails based on individual item results. - -[↑ Back to top](#table-of-contents) - -## What are map operations? - -Map operations let you process collections durably by applying a function to each item in parallel. Each item's processing is checkpointed independently, so if your function is interrupted, completed items don't need to be reprocessed. - -Use map operations to: -- Transform collections with automatic checkpointing -- Process lists of items in parallel -- Handle large datasets with resilience -- Control concurrency behavior -- Define custom success/failure criteria - -Map operations use `context.map()` to process collections efficiently. Each item becomes an independent operation that executes in parallel with other items. - -[↑ Back to top](#table-of-contents) - -## Key features - -- **Parallel processing** - Items process concurrently by default -- **Independent checkpointing** - Each item's result is saved separately -- **Partial completion** - Completed items don't reprocess on replay -- **Concurrency control** - Limit simultaneous processing with `max_concurrency` -- **Flexible completion** - Define custom success/failure criteria -- **Result ordering** - Results maintain the same order as inputs - -[↑ Back to top](#table-of-contents) - -## Getting started - -Here's a simple example of processing a collection: - -```python -from aws_durable_execution_sdk_python import ( - DurableContext, - durable_execution, - BatchResult, -) - -def square(context: DurableContext, item: int, index: int, items: list[int]) -> int: - """Square a number.""" - return item * item - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Process a list of items using map operations.""" - items = [1, 2, 3, 4, 5] - - result = context.map(items, square) - # Convert to dict for JSON serialization (BatchResult is not JSON serializable) - return result.to_dict() -``` - -When this function runs: -1. Each item is processed in parallel -2. The `square` function is called for each item -3. Each result is checkpointed independently -4. The function returns a dict with results `[1, 4, 9, 16, 25]` - -If the function is interrupted after processing items 0-2, it resumes at item 3 without reprocessing the first three items. - -[↑ Back to top](#table-of-contents) - -## Method signature - -### context.map() - -```python -def map( - inputs: Sequence[U], - func: Callable[[DurableContext, U, int, Sequence[U]], T], - name: str | None = None, - config: MapConfig | None = None, -) -> BatchResult[T] -``` - -**Parameters:** - -- `inputs` - A sequence of items to process (list, tuple, or any sequence type). -- `func` - A callable that processes each item. See [Map function signature](#map-function-signature) for details. -- `name` (optional) - A name for the map operation, useful for debugging and testing. -- `config` (optional) - A `MapConfig` object to configure concurrency and completion criteria. - -**Returns:** A `BatchResult[T]` containing the results from processing all items. - -**Raises:** Exceptions based on the completion criteria defined in `MapConfig`. - -[↑ Back to top](#table-of-contents) - -## Map function signature - -The map function receives four parameters: - -```python -def process_item( - context: DurableContext, - item: U, - index: int, - items: Sequence[U] -) -> T: - """Process a single item from the collection.""" - # Your processing logic here - return result -``` - -**Parameters:** - -- `context` - A `DurableContext` for the item's processing. Use this to call steps, waits, or other operations. -- `item` - The current item being processed. -- `index` - The zero-based index of the item in the original collection. -- `items` - The full collection of items being processed. - -**Returns:** The result of processing the item. - -### Example - -```python -def validate_email( - context: DurableContext, - item: str, - index: int, - items: list[str] -) -> dict: - """Validate an email address.""" - is_valid = "@" in item and "." in item - return { - "email": item, - "valid": is_valid, - "position": index, - "total": len(items) - } - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - emails = ["jane_doe@example.com", "john_doe@example.org", "invalid"] - result = context.map(emails, validate_email) - return result.to_dict() -``` - -[↑ Back to top](#table-of-contents) - -## Configuration - -Configure map behavior using `MapConfig`: - -```python -from aws_durable_execution_sdk_python import ( - DurableContext, - durable_execution, - BatchResult, -) -from aws_durable_execution_sdk_python.config import ( - MapConfig, - CompletionConfig, -) - -def process_item(context: DurableContext, item: int, index: int, items: list[int]) -> dict: - """Process a single item.""" - return {"item": item, "squared": item * item} - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - items = list(range(100)) - - # Configure map operation - config = MapConfig( - max_concurrency=10, # Process 10 items at a time - completion_config=CompletionConfig.all_successful(), # Require all to succeed - ) - - result = context.map(items, process_item, name="process_numbers", config=config) - return result.to_dict() -``` - -### MapConfig parameters - -**max_concurrency** - Maximum number of items to process concurrently. If `None`, all items process in parallel. Use this to control resource usage. - -**completion_config** - Defines when the map operation succeeds or fails: -- `CompletionConfig()` - Default, allows any number of failures -- `CompletionConfig.all_successful()` - Requires all items to succeed -- `CompletionConfig(min_successful=N)` - Requires at least N items to succeed -- `CompletionConfig(tolerated_failure_count=N)` - Fails after N failures -- `CompletionConfig(tolerated_failure_percentage=X)` - Fails if more than X% fail - -**serdes** - Custom serialization for the entire `BatchResult`. If `None`, uses JSON serialization. - -**item_serdes** - Custom serialization for individual item results. If `None`, uses JSON serialization. - -**summary_generator** - Function to generate compact summaries for large results (>256KB). - -[↑ Back to top](#table-of-contents) - -## Advanced patterns - -### Concurrency control - -Limit how many items process simultaneously: - -```python -from aws_durable_execution_sdk_python.config import MapConfig - -def fetch_data(context: DurableContext, url: str, index: int, urls: list[str]) -> dict: - """Fetch data from a URL.""" - # Network call that might be rate-limited - return {"url": url, "data": "..."} - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - urls = [f"https://example.com/api/{i}" for i in range(100)] - - # Process only 5 URLs at a time - config = MapConfig(max_concurrency=5) - - result = context.map(urls, fetch_data, config=config) - return result.to_dict()``` - -### Custom completion criteria - -Define when the map operation should succeed or fail: - -```python -from aws_durable_execution_sdk_python.config import MapConfig, CompletionConfig - -def process_item(context: DurableContext, item: int, index: int, items: list[int]) -> dict: - """Process an item that might fail.""" - # Processing that might fail - if item % 7 == 0: - raise ValueError(f"Item {item} failed") - return {"item": item, "processed": True} - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - items = list(range(20)) - - # Succeed if at least 15 items succeed, fail after 5 failures - config = MapConfig( - completion_config=CompletionConfig( - min_successful=15, - tolerated_failure_count=5, - ) - ) - - result = context.map(items, process_item, config=config) - return result.to_dict() -``` - -### Using context operations in map functions - -Call steps, waits, or other operations inside map functions: - -```python -from aws_durable_execution_sdk_python import durable_step, StepContext - -@durable_step -def fetch_user_data(step_context: StepContext, user_id: str) -> dict: - """Fetch user data from external service.""" - return {"user_id": user_id, "name": "Jane Doe", "email": "jane_doe@example.com"} - -@durable_step -def send_notification(step_context: StepContext, user: dict) -> dict: - """Send notification to user.""" - return {"sent": True, "email": user["email"]} - -def process_user( - context: DurableContext, - user_id: str, - index: int, - user_ids: list[str] -) -> dict: - """Process a user by fetching data and sending notification.""" - # Use steps within the map function - user = context.step(fetch_user_data(user_id)) - notification = context.step(send_notification(user)) - return {"user_id": user_id, "notification_sent": notification["sent"]} - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Process multiple users using context operations within map functions.""" - user_ids = ["user_1", "user_2", "user_3"] - - result = context.map(user_ids, process_user) - # Convert to dict for JSON serialization (BatchResult is not JSON serializable) - return result.to_dict() -``` - -### Filtering and transforming results - -Access individual results from the `BatchResult`: - -```python -def check_inventory( - context: DurableContext, - product_id: str, - index: int, - products: list[str] -) -> dict: - """Check if a product is in stock.""" - # Check if product is in stock - return {"product_id": product_id, "in_stock": True, "quantity": 10} - -@durable_execution -def handler(event: dict, context: DurableContext) -> list[str]: - product_ids = ["prod_1", "prod_2", "prod_3", "prod_4"] - - # Get all inventory results - batch_result = context.map(product_ids, check_inventory) - - # Filter to only in-stock products - in_stock = [ - r.result["product_id"] - for r in batch_result.results - if r.result["in_stock"] - ] - - return in_stock -``` - -[↑ Back to top](#table-of-contents) - -## Best practices - -**Use descriptive names** - Name your map operations for easier debugging: `context.map(items, process_item, name="process_orders")`. - -**Control concurrency for external calls** - When calling external APIs, use `max_concurrency` to avoid rate limits. - -**Define completion criteria** - Use `CompletionConfig` to specify when the operation should succeed or fail. - -**Keep map functions focused** - Each map function should process one item. Don't mix collection iteration with item processing. - -**Use context operations** - Call steps, waits, or other operations inside map functions for complex processing. - -**Handle errors gracefully** - Wrap error-prone code in try-except blocks or use completion criteria to tolerate failures. - -**Consider collection size** - For very large collections (10,000+ items), consider processing in chunks. - -**Monitor memory usage** - Large collections create many checkpoints. Monitor Lambda memory usage. - -**Return only necessary data** - Large result objects increase checkpoint size. Return minimal data from map functions. - -[↑ Back to top](#table-of-contents) - -## Performance tips - -**Parallel execution is automatic** - Items execute concurrently by default. Don't try to manually parallelize. - -**Use max_concurrency wisely** - Too much concurrency can overwhelm external services or exhaust Lambda resources. Start conservative and increase as needed. - -**Optimize map functions** - Keep map functions lightweight. Move heavy computation into steps within the map function. - -**Use appropriate completion criteria** - Fail fast with `tolerated_failure_count` to avoid processing remaining items when many fail. - -**Monitor checkpoint size** - Large result objects increase checkpoint size and Lambda memory usage. Return only necessary data. - -**Consider memory limits** - Processing thousands of items creates many checkpoints. Monitor Lambda memory and adjust concurrency. - -**Profile your workload** - Test with representative data to find optimal concurrency settings. - -[↑ Back to top](#table-of-contents) - -## FAQ - -**Q: What's the difference between map and parallel operations?** - -A: Map operations process a collection of similar items using the same function. Parallel operations execute different functions concurrently. Use map for collections, parallel for heterogeneous tasks. - -**Q: How many items can I process?** - -A: There's no hard limit, but consider Lambda's memory and timeout constraints. For very large collections (10,000+ items), consider processing in chunks. - -**Q: Do items process in order?** - -A: Items execute in parallel, so processing order is non-deterministic. However, results maintain the same order as inputs in the `BatchResult`. - -**Q: What happens if one item fails?** - -A: By default, the map operation continues processing other items. Use `CompletionConfig` to define failure behavior (e.g., fail after N failures). - -**Q: Can I use async functions in map operations?** - -A: No, map functions must be synchronous. If you need async processing, use `asyncio.run()` inside your map function. - -**Q: How do I access individual results?** - -A: The `BatchResult` contains a `results` list with each item's result: - -```python -batch_result = context.map(items, process_item) -for item_result in batch_result.results: - print(item_result.result) -``` - -**Q: Can I nest map operations?** - -A: Yes, you can call `context.map()` inside a map function to process nested collections. - -**Q: What's the difference between serdes and item_serdes?** - -A: `item_serdes` serializes individual item results as they complete. `serdes` serializes the entire `BatchResult` at the end. Use both for custom serialization at different levels. - -**Q: How do I handle partial failures?** - -A: Check the `BatchResult.results` list. Each result has a status indicating success or failure: - -```python -batch_result = context.map(items, process_item) -successful = [r for r in batch_result.results if r.status == "SUCCEEDED"] -failed = [r for r in batch_result.results if r.status == "FAILED"] -``` - -**Q: Can I use map operations with steps?** - -A: Yes, call `context.step()` inside your map function to execute steps for each item. - -[↑ Back to top](#table-of-contents) - -## Testing - -You can test map operations using the testing SDK. The test runner executes your function and lets you inspect individual item results. - -### Basic map testing - -```python -import pytest -from aws_durable_execution_sdk_python_testing import InvocationStatus -from my_function import handler - -@pytest.mark.durable_execution( - handler=handler, - lambda_function_name="map_operations", -) -def test_map_operations(durable_runner): - """Test map operations.""" - with durable_runner: - result = durable_runner.run(input={}, timeout=10) - - # Check overall status - assert result.status is InvocationStatus.SUCCEEDED - - # Check the BatchResult - batch_result = result.result - assert batch_result.total_count == 5 - assert batch_result.success_count == 5 - assert batch_result.failure_count == 0 - - # Check individual results - assert batch_result.results[0].result == 1 - assert batch_result.results[1].result == 4 - assert batch_result.results[2].result == 9 -``` - -### Inspecting individual items - -Use `result.get_map()` to inspect the map operation: - -```python -from aws_durable_execution_sdk_python.lambda_service import OperationType - -@pytest.mark.durable_execution( - handler=handler, - lambda_function_name="map_operations", -) -def test_map_individual_items(durable_runner): - """Test individual item processing.""" - with durable_runner: - result = durable_runner.run(input={}, timeout=10) - - # Get the map operation - map_op = result.get_map("square") - assert map_op is not None - - # Verify all items were processed - assert map_op.result.total_count == 5 - - # Check specific items - assert map_op.result.results[0].result == 1 - assert map_op.result.results[2].result == 9 -``` - -### Testing error handling - -Test that individual item failures are handled correctly: - -```python -@pytest.mark.durable_execution( - handler=handler_with_errors, - lambda_function_name="map_with_errors", -) -def test_map_error_handling(durable_runner): - """Test error handling in map operations.""" - with durable_runner: - result = durable_runner.run(input={}, timeout=10) - - # Function should handle errors based on completion config - assert result.status is InvocationStatus.SUCCEEDED - - batch_result = result.result - - # Check that some items succeeded - successful = [r for r in batch_result.results if r.status == "SUCCEEDED"] - assert len(successful) > 0 - - # Check that some items failed - failed = [r for r in batch_result.results if r.status == "FAILED"] - assert len(failed) > 0 -``` - -### Testing with configuration - -Test map operations with custom configuration: - -```python -from aws_durable_execution_sdk_python.config import MapConfig, CompletionConfig - -@pytest.mark.durable_execution( - handler=handler_with_config, - lambda_function_name="map_with_config", -) -def test_map_with_config(durable_runner): - """Test map operations with custom configuration.""" - with durable_runner: - result = durable_runner.run(input={}, timeout=30) - - # Verify the map operation completed - assert result.status is InvocationStatus.SUCCEEDED - - # Get the map operation - map_op = result.get_map("process_items") - - # Verify configuration was applied - assert map_op is not None - assert map_op.result.total_count > 0 -``` - -For more testing patterns, see: -- [Basic tests](../testing-patterns/basic-tests.md) - Simple test examples -- [Complex workflows](../testing-patterns/complex-workflows.md) - Multi-step workflow testing -- [Best practices](../testing-patterns/best-practices.md) - Testing recommendations - -[↑ Back to top](#table-of-contents) - -## See also - -- [Parallel operations](parallel.md) - Execute different functions concurrently -- [Steps](steps.md) - Understanding step operations -- [Child contexts](child-contexts.md) - Organizing complex workflows -- [Configuration](../api-reference/config.md) - MapConfig and CompletionConfig details -- [BatchResult](../api-reference/result.md) - Working with batch results -- [Examples](https://github.com/awslabs/aws-durable-execution-sdk-python/tree/main/examples/src/map) - More map examples - -[↑ Back to top](#table-of-contents) - -## License - -See the [LICENSE](../../LICENSE) file for our project's licensing. - -[↑ Back to top](#table-of-contents) diff --git a/docs/core/parallel.md b/docs/core/parallel.md deleted file mode 100644 index d638210b..00000000 --- a/docs/core/parallel.md +++ /dev/null @@ -1,911 +0,0 @@ -# Parallel Operations - -## Table of Contents - -- [What are parallel operations?](#what-are-parallel-operations) -- [Terminology](#terminology) -- [Key features](#key-features) -- [Getting started](#getting-started) -- [Method signature](#method-signature) -- [Basic usage](#basic-usage) -- [Collecting results](#collecting-results) -- [Configuration](#configuration) -- [Advanced patterns](#advanced-patterns) -- [Error handling](#error-handling) -- [Result ordering](#result-ordering) -- [Performance considerations](#performance-considerations) -- [Best practices](#best-practices) -- [FAQ](#faq) -- [Testing](#testing) -- [See also](#see-also) - -[← Back to main index](../index.md) - -## Terminology - -**Parallel operation** - An operation that executes multiple functions concurrently using `context.parallel()`. Each function runs in its own child context. - -**Branch** - An individual function within a parallel operation. Each branch executes independently and can succeed or fail without affecting other branches. - -**BatchResult** - The result object returned by parallel operations. It includes a `BatchItem` for each branch plus counts and completion metadata. - -**BatchItem** - A per-branch entry with `index`, `status`, `result`, and `error` (if failed). - -**Completion strategy** - Configuration that determines when a parallel operation completes (e.g., all successful, first successful, all completed). - -**Concurrent execution** - Multiple operations executing at the same time. The SDK manages concurrency automatically, executing branches in parallel. - -**Child context** - An isolated execution context created for each branch. Each branch has its own step counter and operation tracking. - -[↑ Back to top](#table-of-contents) - -## What are parallel operations? - -Parallel operations let you execute multiple functions concurrently within a durable function. Each function runs in its own child context and can perform steps, waits, or other operations independently. The SDK manages the concurrent execution and collects results automatically. - -Use parallel operations to: -- Execute independent tasks concurrently for better performance -- Process multiple items that don't depend on each other -- Implement fan-out patterns where one input triggers multiple operations -- Reduce total execution time by running operations simultaneously - -[↑ Back to top](#table-of-contents) - -## Key features - -- **Automatic concurrency** - Functions execute concurrently without manual thread management -- **Independent execution** - Each branch runs in its own child context with isolated state -- **Flexible completion** - Configure when the operation completes (all successful, first successful, etc.) -- **Error isolation** - One branch failing doesn't automatically fail others -- **Result collection** - Automatic collection of per-branch status, results, and errors -- **Concurrency control** - Limit maximum concurrent branches with `max_concurrency` -- **Checkpointing** - Results are checkpointed as branches complete - -[↑ Back to top](#table-of-contents) - -## Getting started - -Here's a simple example of parallel operations: - -```python -from aws_durable_execution_sdk_python import ( - BatchResult, - DurableContext, - durable_execution, -) - -@durable_execution -def handler(event: dict, context: DurableContext) -> list[str]: - """Execute three tasks in parallel.""" - # Define functions to execute in parallel - task1 = lambda ctx: ctx.step(lambda _: "Task 1 complete", name="task1") - task2 = lambda ctx: ctx.step(lambda _: "Task 2 complete", name="task2") - task3 = lambda ctx: ctx.step(lambda _: "Task 3 complete", name="task3") - - # Execute all tasks concurrently - result: BatchResult[str] = context.parallel([task1, task2, task3]) - - # Return successful results - return result.get_results() -``` - -When this function runs: -1. All three tasks execute concurrently -2. Each task runs in its own child context -3. Results are collected as tasks complete -4. The `BatchResult` contains per-branch status and results; `get_results()` returns successes - -[↑ Back to top](#table-of-contents) - -## Method signature - -### context.parallel() - -```python -def parallel( - functions: Sequence[Callable[[DurableContext], T]], - name: str | None = None, - config: ParallelConfig | None = None, -) -> BatchResult[T] -``` - -**Parameters:** - -- `functions` - A sequence of callables that each receive a `DurableContext` and return a result. Each function executes in its own child context. -- `name` (optional) - A name for the parallel operation, useful for debugging and testing. -- `config` (optional) - A `ParallelConfig` object to configure concurrency limits, completion criteria, and serialization. - -**Returns:** A `BatchResult[T]` object containing: -- `all` - List of `BatchItem` entries (one per branch) with `index`, `status`, `result`, and `error` -- `get_results()` - List of successful branch results -- `get_errors()` - List of `ErrorObject` entries for failed branches -- `succeeded()` / `failed()` / `started()` - `BatchItem` lists filtered by status -- `total_count`, `success_count`, `failure_count`, `started_count` - Branch counts by status -- `status` - Overall `BatchItemStatus` (FAILED if any branch failed) -- `completion_reason` - Why the operation completed -- `throw_if_error()` - Raises the first branch error, if any - -**Raises:** Branch exceptions are captured in the `BatchResult`. Call `throw_if_error()` if you want to raise the first failure. - -[↑ Back to top](#table-of-contents) - -## Basic usage - -### Simple parallel execution - -Execute multiple independent operations concurrently: - -```python -from aws_durable_execution_sdk_python import ( - BatchResult, - DurableContext, - durable_execution, -) - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Process multiple services in parallel.""" - - def check_inventory(ctx: DurableContext) -> dict: - return ctx.step(lambda _: {"service": "inventory", "status": "ok"}) - - def check_payment(ctx: DurableContext) -> dict: - return ctx.step(lambda _: {"service": "payment", "status": "ok"}) - - def check_shipping(ctx: DurableContext) -> dict: - return ctx.step(lambda _: {"service": "shipping", "status": "ok"}) - - # Execute all checks in parallel - result: BatchResult[dict] = context.parallel([ - check_inventory, - check_payment, - check_shipping, - ]) - - return { - "total": result.total_count, - "successful": result.success_count, - "results": result.get_results(), - } -``` - -## Collecting results - -The `BatchResult` object provides multiple ways to access results: - -```python -from aws_durable_execution_sdk_python import ( - BatchResult, - DurableContext, - durable_execution, -) - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Demonstrate result collection.""" - - functions = [ - lambda ctx: ctx.step(lambda _: f"Result {i}") - for i in range(5) - ] - - result: BatchResult[str] = context.parallel(functions) - - return { - # Successful results only - "successful": result.succeeded(), - - # Failed results (if any) - "failed": result.failed(), - - # Counts - "total_count": result.total_count, - "success_count": result.success_count, - "failure_count": result.failure_count, - "started_count": result.started_count, - - # Status information - "status": result.status.value, - "completion_reason": result.completion_reason.value, - } -``` - -Use `result.succeeded()`, `result.failed()`, or `result.started()` for `BatchItem` lists filtered by status, and `result.throw_if_error()` to raise the first failure when you want exceptions instead of error objects. - -### Accessing individual results - -Results are ordered by branch index: - -```python -from aws_durable_execution_sdk_python import ( - BatchResult, - DurableContext, - durable_execution, -) - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Access individual results from parallel execution.""" - - def task_a(ctx: DurableContext) -> str: - return ctx.step(lambda _: "Result A") - - def task_b(ctx: DurableContext) -> str: - return ctx.step(lambda _: "Result B") - - def task_c(ctx: DurableContext) -> str: - return ctx.step(lambda _: "Result C") - - result: BatchResult[str] = context.parallel([task_a, task_b, task_c]) - - results = result.get_results() - - # Access results by index - first_result = results[0] # "Result A" - second_result = results[1] # "Result B" - third_result = results[2] # "Result C" - - return { - "first": first_result, - "second": second_result, - "third": third_result, - "all": results, - } -``` - -If you need branch-indexed access even when failures occur, iterate `result.all` and match on `item.index`. - -[↑ Back to top](#table-of-contents) - -## Configuration - -Configure parallel behavior using `ParallelConfig`: - -```python -from aws_durable_execution_sdk_python import ( - BatchResult, - DurableContext, - durable_execution, -) -from aws_durable_execution_sdk_python.config import ( - CompletionConfig, - ParallelConfig, -) - -@durable_execution -def handler(event: dict, context: DurableContext) -> str: - """Configure parallel execution.""" - - # Configure to complete when first branch succeeds - config = ParallelConfig( - max_concurrency=3, # Run at most 3 branches concurrently - completion_config=CompletionConfig.first_successful(), - ) - - functions = [ - lambda ctx: ctx.step(lambda _: "Task 1", name="task1"), - lambda ctx: ctx.step(lambda _: "Task 2", name="task2"), - lambda ctx: ctx.step(lambda _: "Task 3", name="task3"), - ] - - result: BatchResult[str] = context.parallel(functions, config=config) - - # Get the first successful result - results = result.succeeded() - first_result = results[0] if results else "None" - - return f"First successful result: {first_result}" -``` - -### ParallelConfig parameters - -**max_concurrency** - Maximum number of branches to execute concurrently. If `None` (default), all branches run concurrently. Use this to control resource usage: - -```python -# Limit to 5 concurrent branches -config = ParallelConfig(max_concurrency=5) -``` - -**completion_config** - Defines when the parallel operation completes: - -- `CompletionConfig.all_successful()` - Requires all branches to succeed (default) -- `CompletionConfig.first_successful()` - Completes when any branch succeeds -- `CompletionConfig.all_completed()` - Completes when branches finish; check `started_count` if completion criteria are met early -- Custom configuration with specific success/failure thresholds - -```python -# Require at least 3 successes, tolerate up to 2 failures -config = ParallelConfig( - completion_config=CompletionConfig( - min_successful=3, - tolerated_failure_count=2, - ) -) -``` - -**serdes** - Custom serialization for the `BatchResult` object. If not provided, uses JSON serialization. - -**item_serdes** - Custom serialization for individual branch results. If not provided, uses JSON serialization. - -Note: If completion criteria are met early (min success reached or failure tolerance exceeded), unfinished branches are marked `STARTED` in `result.all` and counted in `started_count`. - -[↑ Back to top](#table-of-contents) - -## Advanced patterns - -### First successful pattern - -Execute multiple strategies and use the first one that succeeds: - -```python -from aws_durable_execution_sdk_python import ( - DurableContext, - durable_execution, -) -from aws_durable_execution_sdk_python.config import ( - CompletionConfig, - ParallelConfig, -) - -@durable_execution -def handler(event: dict, context: DurableContext) -> str: - """Try multiple data sources, use first successful.""" - - def try_primary_db(ctx: DurableContext) -> dict: - return ctx.step(lambda _: {"source": "primary", "data": "..."}) - - def try_secondary_db(ctx: DurableContext) -> dict: - return ctx.step(lambda _: {"source": "secondary", "data": "..."}) - - def try_cache(ctx: DurableContext) -> dict: - return ctx.step(lambda _: {"source": "cache", "data": "..."}) - - # Complete as soon as any source succeeds - config = ParallelConfig( - completion_config=CompletionConfig.first_successful() - ) - - result: BatchResult[dict] = context.parallel( - [try_primary_db, try_secondary_db, try_cache], - config=config, - ) - - results = result.get_results() - if results: - return results[0] - - return {"error": "All sources failed"} -``` - -### Controlled concurrency - -Limit concurrent execution to manage resource usage: - -```python -from aws_durable_execution_sdk_python import ( - DurableContext, - durable_execution, -) -from aws_durable_execution_sdk_python.config import ParallelConfig - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Process many items with controlled concurrency.""" - items = event.get("items", []) - - # Create a function for each item - functions = [ - lambda ctx, item=item: ctx.step( - lambda _: f"Processed {item}", - name=f"process_{item}" - ) - for item in items - ] - - # Process at most 10 items concurrently - config = ParallelConfig(max_concurrency=10) - - result: BatchResult[str] = context.parallel(functions, config=config) - - return { - "processed": result.success_count, - "failed": result.failure_count, - "results": result.get_results(), - } -``` - -### Partial success handling - -Handle scenarios where some branches can fail: - -```python -from aws_durable_execution_sdk_python import ( - DurableContext, - durable_execution, -) -from aws_durable_execution_sdk_python.config import ( - CompletionConfig, - ParallelConfig, -) - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Allow some branches to fail.""" - - # Require at least 2 successes, tolerate up to 1 failure - config = ParallelConfig( - completion_config=CompletionConfig( - min_successful=2, - tolerated_failure_count=1, - ) - ) - - functions = [ - lambda ctx: ctx.step(lambda _: "Success 1"), - lambda ctx: ctx.step(lambda _: "Success 2"), - lambda ctx: ctx.step(lambda _: raise_error()), # This might fail - ] - - result: BatchResult[str] = context.parallel(functions, config=config) - - return { - "status": "partial_success", - "successful": result.get_results(), - "failed_count": result.failure_count, - } -``` - -### Nested parallel operations - -Parallel operations can contain other parallel operations: - -```python -from aws_durable_execution_sdk_python import ( - DurableContext, - durable_execution, -) - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Nested parallel execution.""" - - def process_group_a(ctx: DurableContext) -> list: - # Inner parallel operation for group A - task1 = lambda c: c.step(lambda _: "group-a-item-1") - task2 = lambda c: c.step(lambda _: "group-a-item-2") - task3 = lambda c: c.step(lambda _: "group-a-item-3") - - inner_result = ctx.parallel([task1, task2, task3]) - return inner_result.get_results() - - def process_group_b(ctx: DurableContext) -> list: - # Inner parallel operation for group B - task1 = lambda c: c.step(lambda _: "group-b-item-1") - task2 = lambda c: c.step(lambda _: "group-b-item-2") - task3 = lambda c: c.step(lambda _: "group-b-item-3") - - inner_result = ctx.parallel([task1, task2, task3]) - return inner_result.get_results() - - # Outer parallel operation - result: BatchResult[list] = context.parallel([process_group_a, process_group_b]) - - return { - "groups_processed": result.success_count, - "results": result.get_results(), - } -``` - -[↑ Back to top](#table-of-contents) - -## Error handling - -Parallel operations handle errors gracefully, isolating failures to individual branches: - -### Individual branch failures - -When a branch fails, other branches continue executing: - -```python -from aws_durable_execution_sdk_python import ( - DurableContext, - durable_execution, -) -from aws_durable_execution_sdk_python.config import ( - CompletionConfig, - ParallelConfig, -) - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - """Handle individual branch failures.""" - - def successful_task(ctx: DurableContext) -> str: - return ctx.step(lambda _: "Success") - - def failing_task(ctx: DurableContext) -> str: - return ctx.step(lambda _: raise_error("Task failed")) - - functions = [successful_task, failing_task, successful_task] - - # Use all_completed to collect per-branch status; check started_count for early completion - config = ParallelConfig( - completion_config=CompletionConfig.all_completed() - ) - - result: BatchResult[str] = context.parallel(functions, config=config) - - return { - "successful": result.succeeded(), - "failed_count": result.failure_count, - "status": result.status.value, - } -``` - -### Checking for failures - -Inspect the `BatchResult` to detect and handle failures: - -```python -from aws_durable_execution_sdk_python import BatchResult - -result: BatchResult = context.parallel(functions) - -if result.failure_count > 0: - # Some branches failed - return { - "status": "partial_failure", - "successful": result.get_results(), - "failed_count": result.failure_count, - } - -# All branches succeeded -return { - "status": "success", - "results": result.get_results(), -} -``` - -### Completion strategies and errors - -Different completion strategies handle errors differently: - -**all_successful()** - Fails fast when any branch fails: -```python -config = ParallelConfig( - completion_config=CompletionConfig.all_successful() -) -# Stops executing new branches after first failure -``` - -**first_successful()** - Continues until one branch succeeds: -```python -config = ParallelConfig( - completion_config=CompletionConfig.first_successful() -) -# Ignores failures until at least one succeeds -``` - -**all_completed()** - Waits for branches to complete unless completion criteria are met early: -```python -config = ParallelConfig( - completion_config=CompletionConfig.all_completed() -) -# If completion criteria are met early, remaining branches are marked STARTED -``` - -[↑ Back to top](#table-of-contents) - -## Result ordering - -Results in `get_results()` maintain the same order as the input functions: - -```python -from aws_durable_execution_sdk_python import ( - BatchResult, - DurableContext, - durable_execution, -) - -@durable_execution -def handler(event: dict, context: DurableContext) -> list[str]: - """Demonstrate result ordering.""" - - functions = [ - lambda ctx: ctx.step(lambda _: "First"), - lambda ctx: ctx.step(lambda _: "Second"), - lambda ctx: ctx.step(lambda _: "Third"), - ] - - result = context.parallel(functions) - - # Results are in the same order as functions - results = result.get_results() - assert results[0] == "First" - assert results[1] == "Second" - assert results[2] == "Third" - - return results -``` - -**Important:** Even though branches execute concurrently and may complete in any order, the SDK preserves the original order in the results list. This makes it easy to correlate results with inputs. - -### Handling partial results - -When some branches fail, `succeeded()` only contains results from successful branches, but the order is still preserved relative to the input: - -```python -# If function at index 1 fails: -# Input: [func0, func1, func2] -# Result: [result0, result2] # result1 is missing, but order preserved -``` - -[↑ Back to top](#table-of-contents) - -## Performance considerations - -### Concurrency limits - -Use `max_concurrency` to balance performance and resource usage: - -```python -from aws_durable_execution_sdk_python import BatchResult -from aws_durable_execution_sdk_python.config import ParallelConfig - -# Process 100 items, but only 10 at a time -config = ParallelConfig(max_concurrency=10) -result: BatchResult = context.parallel(functions, config=config) -``` - -**When to limit concurrency:** -- Processing many items (hundreds or thousands) -- Calling external APIs with rate limits -- Managing memory usage with large data -- Controlling database connection pools - -**When to use unlimited concurrency:** -- Small number of branches (< 50) -- Independent operations with no shared resources -- When maximum speed is critical - -### Completion strategies - -Choose the right completion strategy for your use case: - -**first_successful()** - Best for: -- Redundant operations (multiple data sources) -- Racing multiple strategies -- Minimizing latency - -**all_successful()** - Best for: -- Operations that must all succeed -- Fail-fast behavior -- Strict consistency requirements - -**all_completed()** - Best for: -- Workflows where you want to observe branch outcomes end-to-end -- Collecting partial results (pair with tolerated failure settings if failures are expected) -- Logging or monitoring tasks - -### Checkpointing overhead - -Each branch creates checkpoints as it executes. For many small branches, consider: -- Batching items together -- Using map operations instead -- Grouping related operations - -[↑ Back to top](#table-of-contents) - -## Best practices - -**Use parallel for independent operations** - Only parallelize operations that don't depend on each other's results. - -**Limit concurrency for large workloads** - Use `max_concurrency` when processing many items to avoid overwhelming resources. - -**Choose appropriate completion strategies** - Match the completion strategy to your business requirements (all must succeed vs. best effort). - -**Handle partial failures gracefully** - Check `failure_count` and handle scenarios where some branches fail. - -**Keep branches focused** - Each branch should be a cohesive unit of work. Don't make branches too granular. - -**Use meaningful names** - Name your parallel operations for easier debugging and testing. - -**Consider map operations for collections** - If you're processing a collection of similar items, use `context.map()` instead. - -**Avoid shared state** - Each branch runs in its own context. Don't rely on shared variables or global state. - -**Monitor resource usage** - Parallel operations can consume significant resources. Monitor memory and API rate limits. - -**Test with realistic concurrency** - Test your parallel operations with realistic numbers of branches to catch resource issues. - -[↑ Back to top](#table-of-contents) - -## FAQ - -**Q: What's the difference between parallel() and map()?** - -A: `parallel()` executes a list of different functions, while `map()` executes the same function for each item in a collection. Use `parallel()` for heterogeneous operations and `map()` for homogeneous operations. - -**Q: How many branches can I run in parallel?** - -A: There's no hard limit, but consider resource constraints. For large numbers (> 100), use `max_concurrency` to limit concurrent execution. - -**Q: Do branches execute in a specific order?** - -A: Branches execute concurrently, so execution order is non-deterministic. However, results are returned in the same order as the input functions. - -**Q: Can I use async functions in parallel operations?** - -A: No, branch functions must be synchronous. If you need to call async code, use `asyncio.run()` inside your function. - -**Q: What happens if all branches fail?** - -A: The behavior depends on your completion configuration. You always get a `BatchResult`; inspect `get_errors()` or `failed()` to see failures, or call `throw_if_error()` to raise the first error. - -**Q: Can I cancel running branches?** - -A: Not directly. The SDK doesn't provide branch cancellation. Use completion strategies like `first_successful()` to stop starting new branches early. - -**Q: How do I pass different arguments to each branch?** - -A: Use lambda functions with default arguments: - -```python -functions = [ - lambda ctx, val=value: process(ctx, val) - for value in values -] -``` - -**Q: Can branches communicate with each other?** - -A: No, branches are isolated. They can't share state or communicate during execution. Pass data through the parent context or use the results after parallel execution completes. - -**Q: What's the overhead of parallel operations?** - -A: Each branch creates a child context and checkpoints its results. For very small operations, the overhead might outweigh the benefits. Profile your specific use case. - -**Q: Can I nest parallel operations?** - -A: Yes, you can call `context.parallel()` inside a branch function. Each nested parallel operation creates its own set of child contexts. - -[↑ Back to top](#table-of-contents) - -## Testing - -You can test parallel operations using the testing SDK. The test runner executes your function and lets you inspect branch results. - -### Basic parallel testing - -```python -import pytest -from aws_durable_execution_sdk_python_testing import InvocationStatus -from my_function import handler - -@pytest.mark.durable_execution( - handler=handler, - lambda_function_name="parallel_function", -) -def test_parallel(durable_runner): - """Test parallel operations.""" - with durable_runner: - result = durable_runner.run(input={"data": "test"}, timeout=10) - - # Check overall status - assert result.status is InvocationStatus.SUCCEEDED - - # Check the result contains expected values - assert len(result.result) == 3 - assert "Task 1 complete" in result.result -``` - -### Inspecting branch operations - -Use the test result to inspect individual branch operations: - -```python -from aws_durable_execution_sdk_python_testing import OperationType - -@pytest.mark.durable_execution( - handler=handler, - lambda_function_name="parallel_function", -) -def test_parallel_branches(durable_runner): - """Test and inspect parallel branches.""" - with durable_runner: - result = durable_runner.run(input={}, timeout=10) - - # Verify all step operations exist - step_ops = [ - op for op in result.operations - if op.operation_type == OperationType.STEP - ] - assert len(step_ops) == 3 - - # Check step names - step_names = {op.name for op in step_ops} - assert step_names == {"task1", "task2", "task3"} -``` - -### Testing completion strategies - -Test that completion strategies work correctly: - -```python -@pytest.mark.durable_execution( - handler=handler_first_successful, - lambda_function_name="first_successful_function", -) -def test_first_successful(durable_runner): - """Test first successful completion strategy.""" - with durable_runner: - result = durable_runner.run(input={}, timeout=10) - - # Should succeed with at least one result - assert result.status is InvocationStatus.SUCCEEDED - assert "First successful result:" in result.result -``` - -### Testing error handling - -Test that parallel operations handle errors correctly: - -```python -@pytest.mark.durable_execution( - handler=handler_with_failures, - lambda_function_name="parallel_with_failures", -) -def test_parallel_with_failures(durable_runner): - """Test parallel operations with some failures.""" - with durable_runner: - result = durable_runner.run(input={}, timeout=10) - - # Check that some branches succeeded - assert result.status is InvocationStatus.SUCCEEDED - assert result.result["successful_count"] > 0 - assert result.result["failed_count"] > 0 -``` - -### Testing concurrency limits - -Test that concurrency limits are respected: - -```python -@pytest.mark.durable_execution( - handler=handler_with_concurrency_limit, - lambda_function_name="limited_concurrency", -) -def test_concurrency_limit(durable_runner): - """Test parallel operations with concurrency limit.""" - with durable_runner: - result = durable_runner.run(input={"items": list(range(20))}, timeout=30) - - # All items should be processed - assert result.status is InvocationStatus.SUCCEEDED - assert len(result.result["results"]) == 20 -``` - -For more testing patterns, see: -- [Basic tests](../testing-patterns/basic-tests.md) - Simple test examples -- [Complex workflows](../testing-patterns/complex-workflows.md) - Multi-step workflow testing -- [Best practices](../testing-patterns/best-practices.md) - Testing recommendations - -[↑ Back to top](#table-of-contents) - -## See also - -- [Map operations](map.md) - Process collections with the same function -- [Child contexts](child-contexts.md) - Understand child context isolation -- [Steps](steps.md) - Use steps within parallel branches -- [Error handling](../advanced/error-handling.md) - Handle errors in durable functions -- [ParallelConfig](../api-reference/config.md) - Configuration options -- [BatchResult](../api-reference/result.md) - Result object reference -- [Examples](https://github.com/awslabs/aws-durable-execution-sdk-python/tree/main/examples/src/parallel) - More parallel examples - -[↑ Back to top](#table-of-contents) - -## License - -See the [LICENSE](../../LICENSE) file for our project's licensing. - -[↑ Back to top](#table-of-contents) diff --git a/docs/core/steps.md b/docs/core/steps.md deleted file mode 100644 index 6ad7ac49..00000000 --- a/docs/core/steps.md +++ /dev/null @@ -1,597 +0,0 @@ -# Steps - -## Table of Contents - -- [What are steps?](#what-are-steps) -- [Terminology](#terminology) -- [Key features](#key-features) -- [Getting started](#getting-started) -- [Method signature](#method-signature) -- [Using the @durable_step decorator](#using-the-durable_step-decorator) -- [Naming steps](#naming-steps) -- [Configuration](#configuration) -- [Advanced patterns](#advanced-patterns) -- [Best practices](#best-practices) -- [FAQ](#faq) -- [Testing](#testing) -- [See also](#see-also) - -[← Back to main index](../index.md) - -## Terminology - -**Step** - A durable operation that executes a function and checkpoints its result. Created using `context.step()`. - -**Step function** - A function decorated with `@durable_step` that can be executed as a step. Receives a `StepContext` as its first parameter. - -**Checkpoint** - A saved state of execution that allows your function to resume from a specific point. The SDK creates checkpoints automatically after each step completes. - -**Replay** - The process of re-executing your function code when resuming from a checkpoint. Completed steps return their saved results instantly without re-executing. - -**Step semantics** - Controls how many times a step executes per retry attempt. At-least-once (default) re-executes on retry. At-most-once executes only once per retry attempt. - -**StepContext** - A context object passed to step functions containing metadata about the current execution. - -[↑ Back to top](#table-of-contents) - -## What are steps? - -Steps are the fundamental building blocks of durable functions. A step is a unit of work that executes your code and automatically checkpoints the result. A completed step won't execute again, it returns its saved result instantly. If a step fails to complete, it automatically retries and saves the error after all retry attempts are exhausted. - -Use steps to: -- Execute business logic with automatic checkpointing -- Retry operations that might fail -- Control execution semantics (at-most-once or at-least-once) -- Break complex workflows into manageable units - -[↑ Back to top](#table-of-contents) - -## Key features - -- **Automatic checkpointing** - Results are saved automatically after execution -- **Configurable retry** - Define retry strategies with custom backoff -- **Execution semantics** - Choose at-most-once or at-least-once per retry -- **Named operations** - Identify steps by name for debugging and testing -- **Custom serialization** - Control how inputs and results are serialized -- **Instant replay** - Completed steps return saved results without re-executing - -[↑ Back to top](#table-of-contents) - -## Getting started - -Here's a simple example of using steps: - -```python -from aws_durable_execution_sdk_python import ( - DurableContext, - durable_execution, - durable_step, - StepContext, -) - -@durable_step -def add_numbers(step_context: StepContext, a: int, b: int) -> int: - """Add two numbers together.""" - return a + b - -@durable_execution -def handler(event: dict, context: DurableContext) -> int: - """Simple durable function with a step.""" - result = context.step(add_numbers(5, 3)) - return result -``` - -When this function runs: -1. `add_numbers(5, 3)` executes and returns 8 -2. The result is checkpointed automatically -3. If the durable function replays, the step returns 8 instantly without re-executing the `add_numbers` function - -[↑ Back to top](#table-of-contents) - -## Method signature - -### context.step() - -```python -def step( - func: Callable[[StepContext], T], - name: str | None = None, - config: StepConfig | None = None, -) -> T -``` - -**Parameters:** - -- `func` - A callable that receives a `StepContext` and returns a result. Use the `@durable_step` decorator to create step functions. -- `name` (optional) - A name for the step, useful for debugging. If you decorate `func` with `@durable_step`, the SDK uses the function's name automatically. -- `config` (optional) - A `StepConfig` object to configure retry behavior, execution semantics, and serialization. - -**Returns:** The result of executing the step function. - -**Raises:** Any exception raised by the step function (after retries are exhausted if configured). - -[↑ Back to top](#table-of-contents) - -## Using the @durable_step decorator - -The `@durable_step` decorator marks a function as a step function. Step functions receive a `StepContext` as their first parameter: - -```python -from aws_durable_execution_sdk_python import durable_step, StepContext - -@durable_step -def validate_order(step_context: StepContext, order_id: str) -> dict: - """Validate an order.""" - # Your validation logic here - return {"order_id": order_id, "valid": True} - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - order_id = event["order_id"] - validation = context.step(validate_order(order_id)) - return validation -``` - -**Why use @durable_step?** - -The decorator wraps your function so it can be called with arguments and passed to `context.step()`. It also automatically uses the wrapped function's name as the step's name. You can optionally use lambda functions instead: - -```python -# With @durable_step (recommended) -result = context.step(validate_order(order_id)) - -# Optionally, use a lambda function -result = context.step(lambda _: validate_order_logic(order_id)) -``` - -**StepContext parameter:** - -The `StepContext` provides metadata about the current execution. While you must include it in your function signature, you typically don't need to use it unless you need execution metadata or custom logging. - -[↑ Back to top](#table-of-contents) - -## Naming steps - -You can name steps explicitly using the `name` parameter. Named steps are easier to identify in logs and tests: - -```python -@durable_execution -def handler(event: dict, context: DurableContext) -> str: - # Explicit name - result = context.step( - lambda _: "Step with explicit name", - name="custom_step" - ) - return f"Result: {result}" -``` - -If you don't provide a name, the SDK uses the function's name automatically when using `@durable_step`: - -```python -@durable_step -def process_payment(step_context: StepContext, amount: float) -> dict: - return {"status": "completed", "amount": amount} - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - # Step is automatically named "process_payment" - result = context.step(process_payment(100.0)) - return result -``` - -**Naming best practices:** - -- Use descriptive names that explain what the step does -- Keep names consistent across your codebase -- Use names when you need to inspect specific steps in tests -- Let the SDK auto-name steps when using `@durable_step` - -**Note:** Names don't need to be unique, but using distinct names improves observability when debugging or monitoring your workflows. - -[↑ Back to top](#table-of-contents) - -## Configuration - -Configure step behavior using `StepConfig`: - -```python -from aws_durable_execution_sdk_python import ( - DurableContext, - durable_execution, - durable_step, - StepContext, -) -from aws_durable_execution_sdk_python.config import StepConfig, StepSemantics -from aws_durable_execution_sdk_python.retries import ( - RetryStrategyConfig, - create_retry_strategy, -) - -@durable_step -def process_data(step_context: StepContext, data: str) -> dict: - """Process data with potential for transient failures.""" - # Your processing logic here - return {"processed": data, "status": "completed"} - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - # Create a retry strategy - retry_config = RetryStrategyConfig( - max_attempts=3, - retryable_error_types=[RuntimeError, ValueError], - ) - - # Configure the step - step_config = StepConfig( - retry_strategy=create_retry_strategy(retry_config), - step_semantics=StepSemantics.AT_LEAST_ONCE_PER_RETRY, - ) - - # Use the configuration - result = context.step(process_data(event["data"]), config=step_config) - return result -``` - -### StepConfig parameters - -**retry_strategy** - A function that determines whether to retry after an exception. Use `create_retry_strategy()` to build one from `RetryStrategyConfig`. - -**step_semantics** - Controls execution behavior on retry: -- `AT_LEAST_ONCE_PER_RETRY` (default) - Step re-executes on each retry attempt -- `AT_MOST_ONCE_PER_RETRY` - Step executes only once per retry attempt, even if the function is replayed - -**serdes** - Custom serialization/deserialization for the step result. If not provided, uses JSON serialization. - -[↑ Back to top](#table-of-contents) - -## Advanced patterns - -### Retry with exponential backoff - -Configure steps to retry with exponential backoff when they fail: - -```python -from aws_durable_execution_sdk_python import ( - DurableContext, - durable_execution, -) -from aws_durable_execution_sdk_python.config import StepConfig -from aws_durable_execution_sdk_python.retries import ( - RetryStrategyConfig, - create_retry_strategy, -) - -@durable_execution -def handler(event: dict, context: DurableContext) -> str: - # Configure exponential backoff - retry_config = RetryStrategyConfig( - max_attempts=3, - initial_delay_seconds=1, - max_delay_seconds=10, - backoff_rate=2.0, - ) - - step_config = StepConfig( - retry_strategy=create_retry_strategy(retry_config) - ) - - result = context.step( - lambda _: "Step with exponential backoff", - name="retry_step", - config=step_config, - ) - return f"Result: {result}" -``` - -This configuration: -- Retries up to 3 times -- Waits 1 second before the first retry -- Doubles the wait time for each subsequent retry (2s, 4s, 8s) -- Caps the wait time at 10 seconds - -### Retry specific exceptions - -Only retry certain types of errors: - -```python -from random import random -from aws_durable_execution_sdk_python import ( - DurableContext, - durable_execution, - durable_step, - StepContext, -) -from aws_durable_execution_sdk_python.config import StepConfig -from aws_durable_execution_sdk_python.retries import ( - RetryStrategyConfig, - create_retry_strategy, -) - -@durable_step -def unreliable_operation(step_context: StepContext) -> str: - """Operation that might fail.""" - if random() > 0.5: - raise RuntimeError("Random error occurred") - return "Operation succeeded" - -@durable_execution -def handler(event: dict, context: DurableContext) -> str: - # Only retry RuntimeError, not other exceptions - retry_config = RetryStrategyConfig( - max_attempts=3, - retryable_error_types=[RuntimeError], - ) - - result = context.step( - unreliable_operation(), - config=StepConfig(create_retry_strategy(retry_config)), - ) - - return result -``` - -### At-most-once semantics - -Use at-most-once semantics when your step has side effects that shouldn't be repeated: - -```python -from aws_durable_execution_sdk_python.config import StepConfig, StepSemantics - -@durable_step -def charge_credit_card(step_context: StepContext, amount: float) -> dict: - """Charge a credit card - should only happen once.""" - # Payment processing logic - return {"transaction_id": "txn_123", "status": "completed"} - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - # Use at-most-once to prevent duplicate charges - step_config = StepConfig( - step_semantics=StepSemantics.AT_MOST_ONCE_PER_RETRY - ) - - payment = context.step( - charge_credit_card(event["amount"]), - config=step_config, - ) - - return payment -``` - -With at-most-once semantics: -- The step executes only once per retry attempt -- If the function replays due to Lambda recycling, the step returns the saved result -- Use this for operations with side effects like payments, emails, or database writes - -### Multiple steps in sequence - -Chain multiple steps together to build complex workflows: - -```python -@durable_step -def fetch_user(step_context: StepContext, user_id: str) -> dict: - """Fetch user data.""" - return {"user_id": user_id, "name": "Jane Doe", "email": "jane_doe@example.com"} - -@durable_step -def validate_user(step_context: StepContext, user: dict) -> bool: - """Validate user data.""" - return user.get("email") is not None - -@durable_step -def send_notification(step_context: StepContext, user: dict) -> dict: - """Send notification to user.""" - return {"sent": True, "email": user["email"]} - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - user_id = event["user_id"] - - # Step 1: Fetch user - user = context.step(fetch_user(user_id)) - - # Step 2: Validate user - is_valid = context.step(validate_user(user)) - - if not is_valid: - return {"status": "failed", "reason": "invalid_user"} - - # Step 3: Send notification - notification = context.step(send_notification(user)) - - return { - "status": "completed", - "user_id": user_id, - "notification_sent": notification["sent"], - } -``` - -Each step is checkpointed independently. If the function is interrupted after step 1, it resumes at step 2 without re-fetching the user. - -[↑ Back to top](#table-of-contents) - -## Best practices - -**Use @durable_step for reusable functions** - Decorate functions you'll use as steps to get automatic naming and convenient with succinct syntax. - -**Name steps for debugging** - Use explicit names for steps you'll need to inspect in logs or tests. - -**Keep steps focused** - Each step should do one thing. Break complex operations into multiple steps. - -**Use retry for transient failures** - Configure retry strategies for operations that might fail temporarily (network calls, rate limits). - -**Choose semantics carefully** - Use at-most-once for operations with side effects. Use at-least-once (default) for idempotent operations. - -**Don't share state between steps** - Pass data between steps through return values, not global variables. - -**Wrap non-deterministic code in steps** - All non-deterministic code, such as random values or timestamps, must be wrapped in a step. Once the step completes, the result won't change on replay. - -**Handle errors explicitly** - Catch and handle exceptions in your step functions. Let retries handle transient failures. - -[↑ Back to top](#table-of-contents) - -## FAQ - -**Q: What's the difference between a step and a regular function call?** - -A: A step is checkpointed automatically. Completed steps return their saved results without re-executing. Regular function calls execute every time your function runs. - -**Q: When should I use at-most-once vs at-least-once semantics?** - -A: Use at-most-once for operations with side effects (payments, emails, database writes). Use at-least-once (default) for idempotent operations (calculations, data transformations). - -**Q: Can I use async functions as steps?** - -A: No, step functions must be synchronous. If you need to call async code, use `asyncio.run()` inside your step function. - -**Q: How do I pass multiple arguments to a step?** - -A: Use the `@durable_step` decorator and pass arguments when calling the function: - -```python -@durable_step -def my_step(step_context: StepContext, arg1: str, arg2: int) -> str: - return f"{arg1}: {arg2}" - -result = context.step(my_step("value", 42)) -``` - -**Q: Can I nest steps inside other steps?** - -A: No, you can't call `context.step()` inside a step function. Steps are leaf operations. Use child contexts if you need nested operations. - -**Q: What happens if a step raises an exception?** - -A: If no retry strategy is configured, the exception propagates and fails the execution. If retry is configured, the SDK retries according to your strategy. After exhausting retries, the step checkpoints the error and the exception propagates. - -**Q: How do I access the StepContext?** - -A: The `StepContext` is passed as the first parameter to your step function. It contains metadata about the execution, though you typically don't need to use it. - -**Q: Can I use lambda functions as steps?** - -A: Yes, but they won't have automatic names: - -```python -result = context.step(lambda _: "some value", name="my_step") -``` - -Use `@durable_step` for better ergonomics. - -[↑ Back to top](#table-of-contents) - -## Testing - -You can test steps using the testing SDK. The test runner executes your function and lets you inspect step results. - -### Basic step testing - -```python -import pytest -from aws_durable_execution_sdk_python_testing import InvocationStatus -from my_function import handler - -@pytest.mark.durable_execution( - handler=handler, - lambda_function_name="my_function", -) -def test_step(durable_runner): - """Test a function with steps.""" - with durable_runner: - result = durable_runner.run(input={"data": "test"}, timeout=10) - - # Check overall status - assert result.status is InvocationStatus.SUCCEEDED - - # Check final result - assert result.result == 8 -``` - -### Inspecting step results - -Use `result.get_step()` to inspect individual step results: - -```python -@pytest.mark.durable_execution( - handler=handler, - lambda_function_name="my_function", -) -def test_step_result(durable_runner): - """Test and inspect step results.""" - with durable_runner: - result = durable_runner.run(input={"data": "test"}, timeout=10) - - # Get step by name - step_result = result.get_step("add_numbers") - assert step_result.result == 8 - - # Check step status - assert step_result.status is InvocationStatus.SUCCEEDED -``` - -### Testing retry behavior - -Test that steps retry correctly on failure: - -```python -@pytest.mark.durable_execution( - handler=handler_with_retry, - lambda_function_name="retry_function", -) -def test_step_retry(durable_runner): - """Test step retry behavior.""" - with durable_runner: - result = durable_runner.run(input={}, timeout=30) - - # Function should eventually succeed after retries - assert result.status is InvocationStatus.SUCCEEDED - - # Inspect the step that retried - step_result = result.get_step("unreliable_operation") - assert step_result.status is InvocationStatus.SUCCEEDED -``` - -### Testing error handling - -Test that steps fail correctly when errors occur: - -```python -@pytest.mark.durable_execution( - handler=handler_with_error, - lambda_function_name="error_function", -) -def test_step_error(durable_runner): - """Test step error handling.""" - with durable_runner: - result = durable_runner.run(input={}, timeout=10) - - # Function should fail - assert result.status is InvocationStatus.FAILED - - # Check the error - assert "RuntimeError" in str(result.error) -``` - -For more testing patterns, see: -- [Basic tests](../testing-patterns/basic-tests.md) - Simple test examples -- [Complex workflows](../testing-patterns/complex-workflows.md) - Multi-step workflow testing -- [Best practices](../testing-patterns/best-practices.md) - Testing recommendations - -[↑ Back to top](#table-of-contents) - -## See also - -- [DurableContext API](../api-reference/context.md) - Complete context reference -- [StepConfig](../api-reference/config.md) - Configuration options -- [Retry strategies](../advanced/error-handling.md) - Implementing retry logic -- [Wait operations](wait.md) - Pause execution between steps -- [Child contexts](child-contexts.md) - Organize complex workflows -- [Examples](https://github.com/awslabs/aws-durable-execution-sdk-python/tree/main/examples/src/step) - More step examples - -[↑ Back to top](#table-of-contents) - -## License - -See the [LICENSE](../../LICENSE) file for our project's licensing. - -[↑ Back to top](#table-of-contents) diff --git a/docs/core/wait.md b/docs/core/wait.md deleted file mode 100644 index e7d40a1b..00000000 --- a/docs/core/wait.md +++ /dev/null @@ -1,445 +0,0 @@ -# Wait Operations - -## Table of Contents - -- [What are wait operations?](#what-are-wait-operations) -- [When to use wait operations](#when-to-use-wait-operations) -- [Terminology](#terminology) -- [Key features](#key-features) -- [Getting started](#getting-started) -- [Method signature](#method-signature) -- [Duration helpers](#duration-helpers) -- [Naming wait operations](#naming-wait-operations) -- [Multiple sequential waits](#multiple-sequential-waits) -- [Understanding scheduled_end_timestamp](#understanding-scheduled_end_timestamp) -- [Best practices](#best-practices) -- [FAQ](#faq) -- [Alternatives to wait operations](#alternatives-to-wait-operations) -- [Testing](#testing) -- [See also](#see-also) - -[← Back to main index](../index.md) - -## Terminology - -**Wait operation** - A durable operation that pauses execution for a specified duration. Created using `context.wait()`. - -**Duration** - A time period specified in seconds, minutes, hours, or days using the `Duration` class. - -**Scheduled end timestamp** - The Unix timestamp (in milliseconds) when the wait operation is scheduled to complete. - -**Suspend** - The process of pausing execution and saving state. The Lambda function exits and resumes later. - -**Resume** - The process of continuing execution after a wait completes. The SDK automatically invokes your function again. - -[↑ Back to top](#table-of-contents) - -## What are wait operations? - -Wait operations pause execution for a specified time. Your function suspends, the Lambda exits, and the system automatically resumes execution when the wait completes. - -Unlike `time.sleep()`, waits don't consume Lambda execution time. Your function checkpoints, exits cleanly, and resumes later, even if the wait lasts hours or days. - -[↑ Back to top](#table-of-contents) - -## When to use wait operations - -Use `context.wait()` when you need a simple time-based delay. - -**Choose a different method if you need:** -- **Wait for external system response** β†’ Use [`context.wait_for_callback()`](callbacks.md) -- **Wait until a condition is met** β†’ Use `context.wait_for_condition()` -- **Wait for a step to complete** β†’ Use [`context.step()`](steps.md) - -[↑ Back to top](#table-of-contents) - -## Key features - -- **Durable pauses** - Execution suspends and resumes automatically -- **Flexible durations** - Specify time in seconds, minutes, hours, or days -- **Named operations** - Identify waits by name for debugging and testing -- **Automatic scheduling** - The SDK handles timing and resumption -- **Sequential waits** - Chain multiple waits together -- **No polling required** - The system invokes your function when ready - -[↑ Back to top](#table-of-contents) - -## Getting started - -Here's a simple example of using a wait operation: - -```python -from aws_durable_execution_sdk_python import DurableContext, durable_execution -from aws_durable_execution_sdk_python.config import Duration - -@durable_execution -def handler(event: dict, context: DurableContext) -> str: - """Simple durable function with a wait.""" - # Wait for 5 seconds - context.wait(duration=Duration.from_seconds(5)) - return "Wait completed" -``` - -When this function runs: -1. The wait operation is checkpointed with a scheduled end time -2. The Lambda function exits (suspends) -3. After 5 seconds, the system automatically invokes your function again -4. Execution resumes after the wait and returns "Wait completed" - -[↑ Back to top](#table-of-contents) - -## Method signature - -### context.wait() - -```python -def wait( - duration: Duration, - name: str | None = None, -) -> None -``` - -**Parameters:** - -- `duration` (Duration, required) - How long to wait. Must be at least 1 second. Use `Duration.from_seconds()`, `Duration.from_minutes()`, `Duration.from_hours()`, or `Duration.from_days()` to create a duration. -- `name` (str, optional) - A name for the wait operation. Useful for debugging and testing. - -**Returns:** None - -**Raises:** -- `ValidationError` - If duration is less than 1 second - -[↑ Back to top](#table-of-contents) - -## Duration helpers - -The `Duration` class provides convenient methods to specify time periods: - -```python -from aws_durable_execution_sdk_python.config import Duration - -# Wait for 30 seconds -context.wait(duration=Duration.from_seconds(30)) - -# Wait for 5 minutes -context.wait(duration=Duration.from_minutes(5)) - -# Wait for 2 hours -context.wait(duration=Duration.from_hours(2)) - -# Wait for 1 day -context.wait(duration=Duration.from_days(1)) -``` - -If using duration in seconds, you can also create a Duration directly: - -```python -# Wait for 300 seconds (5 minutes) -context.wait(duration=Duration(seconds=300)) -``` - -[↑ Back to top](#table-of-contents) - -## Naming wait operations - -You can name wait operations to make them easier to identify in logs and tests: - -```python -from aws_durable_execution_sdk_python import DurableContext, durable_execution -from aws_durable_execution_sdk_python.config import Duration - -@durable_execution -def handler(event: dict, context: DurableContext) -> str: - """Durable function with a named wait.""" - # Wait with explicit name - context.wait(duration=Duration.from_seconds(2), name="custom_wait") - return "Wait with name completed" -``` - -Named waits are helpful when: -- You have multiple waits in your function -- You want to identify specific waits in test assertions -- You're debugging execution flow - -[↑ Back to top](#table-of-contents) - -## Understanding scheduled_end_timestamp - -Each wait operation has a `scheduled_end_timestamp` attribute that indicates when the wait is scheduled to complete. This timestamp is in Unix milliseconds. - -You can access this timestamp when inspecting operations in tests or logs. The SDK uses this timestamp to determine when to resume your function. - -The scheduled end time is calculated when the wait operation is first checkpointed: -- Current time + wait duration = scheduled end timestamp - -[↑ Back to top](#table-of-contents) - -## Best practices - -### Choose appropriate wait durations - -When your function hits a wait, it terminates execution and doesn't incur compute charges during the wait period. The function resumes with a new invocation when the wait completes. Choose durations based on your workflow needs: - -```python -# Short wait for rate limiting -context.wait(duration=Duration.from_seconds(30)) - -# Medium wait for polling intervals -context.wait(duration=Duration.from_minutes(5)) - -# Long wait for scheduled tasks -context.wait(duration=Duration.from_hours(24)) -``` - -**Note:** If you have concurrent operations running (like parallel or map operations), those continue executing even when the main execution hits a wait. The function waits for all concurrent operations to complete before terminating. - -### Use named waits for clarity - -Name your waits when you have multiple waits or complex logic: - -```python -# Good - clear purpose -context.wait(duration=Duration.from_seconds(60), name="rate_limit_cooldown") -context.wait(duration=Duration.from_minutes(5), name="polling_interval") - -# Less clear - unnamed waits -context.wait(duration=Duration.from_seconds(60)) -context.wait(duration=Duration.from_minutes(5)) -``` - -### Combine waits with steps - -Use waits between steps to implement delays in your workflow: - -```python -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - # Start a process - job_id = context.step(start_job()) - - # Wait before checking status - context.wait(duration=Duration.from_seconds(30), name="initial_delay") - - # Check status - status = context.step(check_job_status(job_id)) - - return {"job_id": job_id, "status": status} -``` - -### Avoid very short waits - -Waits must be at least 1 second. For very short delays, consider if you actually need a wait: - -```python -# Avoid - too short, will raise ValidationError -context.wait(duration=Duration.from_seconds(0)) - -# Minimum - 1 second -context.wait(duration=Duration.from_seconds(1)) - -# Better - use meaningful durations -context.wait(duration=Duration.from_seconds(5)) -``` - -[↑ Back to top](#table-of-contents) - -## FAQ - -### How long can a wait operation last? - -There is an upper limit of 1 year - that's the maximum length of an execution. - -The wait itself doesn't consume Lambda execution time, your function suspends and resumes later. However, consider cost implications of long-running executions. - -### Can I cancel a wait operation? - -No, once a wait operation is checkpointed, it will complete after the specified duration. Design your workflows with this in mind. - -### Do waits execute in parallel? - -No, waits execute sequentially in the order they appear in your code. If you need parallel operations, use `context.parallel()` or `context.map()` instead. - -### How accurate are wait durations? - -Wait durations are approximate. The actual resume time depends on: -- System scheduling -- Lambda cold start time -- Current system load - -### Can I use waits for polling? - -You can, but we recommend using `context.wait_for_condition()` instead. It simplifies polling by handling the loop logic for you: - -```python -from aws_durable_execution_sdk_python.waits import WaitForConditionConfig, FixedWait - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - job_id = context.step(start_job()) - - # wait_for_condition handles the polling loop - def check_status(state, check_context): - status = get_job_status(state["job_id"]) - return {"job_id": state["job_id"], "status": status} - - result = context.wait_for_condition( - check=check_status, - config=WaitForConditionConfig( - initial_state={"job_id": job_id}, - condition=lambda state: state["status"] == "completed", - wait_strategy=FixedWait(Duration.from_minutes(1)) - ) - ) - return result -``` - -[↑ Back to top](#table-of-contents) - -## Alternatives to wait operations - -### Using wait_for_callback for external responses - -When you need to wait for an external system to respond, use `context.wait_for_callback()`: - -```python -from aws_durable_execution_sdk_python import DurableContext, durable_execution - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - # Wait for external approval - def submit_for_approval(callback_id: str): - # Send callback_id to external approval system - send_to_approval_system(callback_id) - - result = context.wait_for_callback( - submitter=submit_for_approval, - name="approval_wait" - ) - return result -``` - -See [Callbacks](callbacks.md) for more details. - -### Using wait_for_condition for polling - -When you need to poll until a condition is met, use `context.wait_for_condition()`: - -```python -from aws_durable_execution_sdk_python import DurableContext, durable_execution -from aws_durable_execution_sdk_python.waits import WaitForConditionConfig, ExponentialBackoff -from aws_durable_execution_sdk_python.config import Duration - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - # Poll until job completes - def check_job_status(state, check_context): - status = get_job_status(state["job_id"]) - return { - "job_id": state["job_id"], - "status": status, - "done": status == "COMPLETED" - } - - result = context.wait_for_condition( - check=check_job_status, - config=WaitForConditionConfig( - initial_state={"job_id": "job-123", "done": False}, - condition=lambda state: state["done"], - wait_strategy=ExponentialBackoff( - initial_wait=Duration.from_seconds(5) - ) - ) - ) - return result -``` - -[↑ Back to top](#table-of-contents) - -## Testing - -### Testing wait operations - -You can verify wait operations in your tests by inspecting the operations list: - -```python -import pytest -from aws_durable_execution_sdk_python.execution import InvocationStatus -from src.wait import wait - -@pytest.mark.durable_execution( - handler=wait.handler, - lambda_function_name="Wait State", -) -def test_wait(durable_runner): - """Test wait example.""" - with durable_runner: - result = durable_runner.run(input="test", timeout=10) - - assert result.status is InvocationStatus.SUCCEEDED - - # Find the wait operation - wait_ops = [op for op in result.operations if op.operation_type.value == "WAIT"] - assert len(wait_ops) == 1 - - # Verify the wait has a scheduled end timestamp - wait_op = wait_ops[0] - assert wait_op.scheduled_end_timestamp is not None -``` - -### Testing multiple waits - -When testing functions with multiple waits, you can verify each wait individually: - -```python -@pytest.mark.durable_execution(handler=multiple_wait.handler) -def test_multiple_waits(durable_runner): - """Test multiple sequential waits.""" - with durable_runner: - result = durable_runner.run(input="test", timeout=20) - - assert result.status is InvocationStatus.SUCCEEDED - - # Find all wait operations - wait_ops = [op for op in result.operations if op.operation_type.value == "WAIT"] - assert len(wait_ops) == 2 - - # Verify both waits have names - wait_names = [op.name for op in wait_ops] - assert "wait-1" in wait_names - assert "wait-2" in wait_names -``` - -### Testing named waits - -Named waits are easier to identify in tests: - -```python -@pytest.mark.durable_execution(handler=wait_with_name.handler) -def test_named_wait(durable_runner): - """Test wait with custom name.""" - with durable_runner: - result = durable_runner.run(input="test", timeout=10) - - assert result.status is InvocationStatus.SUCCEEDED - - # Find the named wait operation - wait_ops = [op for op in result.operations - if op.operation_type.value == "WAIT" and op.name == "custom_wait"] - assert len(wait_ops) == 1 -``` - -[↑ Back to top](#table-of-contents) - -## See also - -- [Steps](steps.md) - Execute business logic with automatic checkpointing -- [Callbacks](callbacks.md) - Wait for external system responses -- [Getting Started](../getting-started.md) - Learn the basics of durable functions - -[↑ Back to top](#table-of-contents) - -## License - -See the [LICENSE](../../LICENSE) file for our project's licensing. - -[↑ Back to main index](../index.md) diff --git a/docs/getting-started.md b/docs/getting-started.md deleted file mode 100644 index f89452b1..00000000 --- a/docs/getting-started.md +++ /dev/null @@ -1,293 +0,0 @@ -# Getting started - -## Table of Contents - -- [Overview](#overview) -- [The two SDKs](#the-two-sdks) -- [How durable execution works](#how-durable-execution-works) -- [Your development workflow](#your-development-workflow) -- [Quick start](#quick-start) -- [Next steps](#next-steps) - -[← Back to main index](index.md) - -## Overview - -This guide explains the fundamental concepts behind durable execution and how the SDK works. You'll understand: - -- The difference between `aws-durable-execution-sdk-python` and `aws-durable-execution-sdk-python-testing` -- How checkpoints and replay enable reliable workflows -- Why your function code runs multiple times but side effects happen once -- The development workflow from writing to testing to deployment - -[↑ Back to top](#table-of-contents) - -## The two SDKs - -The durable execution ecosystem has two separate packages: - -### Execution SDK (aws-durable-execution-sdk-python) - -This is the **core SDK** that runs in your Lambda functions. It provides: - -- `DurableContext` - The main interface for durable operations -- Operations - Steps, waits, callbacks, parallel, map, child contexts -- Decorators - `@durable_execution`, `@durable_step`, etc. -- Configuration - StepConfig, CallbackConfig, retry strategies -- Serialization - How data is saved in checkpoints - -Install it in your Lambda deployment package: - -```console -pip install aws-durable-execution-sdk-python -``` - -### Testing SDK (aws-durable-execution-sdk-python-testing) - -This is a **separate SDK** for testing your durable functions. It provides: - -- `DurableFunctionTestRunner` - Run functions locally without AWS -- `DurableFunctionCloudTestRunner` - Test deployed Lambda functions -- Pytest integration - Fixtures and markers for writing tests -- Result inspection - Examine execution state and operation results - -Install it in your development environment only: - -```console -pip install aws-durable-execution-sdk-python-testing -``` - -**Key distinction:** The execution SDK runs in production Lambda. The testing SDK runs on your laptop or CI/CD. They're separate concerns. - -[↑ Back to top](#table-of-contents) - -## How durable execution works - -Let's trace through a simple workflow to understand the execution model: - -```python -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - # Step 1: Call external API - data = context.step(fetch_data(event["id"])) - - # Step 2: Wait 30 seconds - context.wait(Duration.from_seconds(30)) - - # Step 3: Process the data - result = context.step(process_data(data)) - - return result -``` - -**First invocation (t=0s):** - -1. Lambda invokes your function -2. `fetch_data` executes and calls an external API -3. Result is checkpointed to AWS -4. `context.wait(Duration.from_seconds(30))` is reached -5. Function returns, Lambda can recycle the environment - -**Second invocation (t=30s):** - -1. Lambda invokes your function again -2. Function code runs from the beginning -3. `fetch_data` returns the checkpointed result instantly (no API call) -4. `context.wait(Duration.from_seconds(30))` is already complete, execution continues -5. `process_data` executes for the first time -6. Result is checkpointed -7. Function returns the final result - -**Key insights:** - -- Your function code runs twice, but `fetch_data` only calls the API once -- The wait doesn't block Lambda - your environment can be recycled -- You write linear code that looks synchronous -- The SDK handles all the complexity of state management - -[↑ Back to top](#table-of-contents) - -## Your development workflow - -```mermaid -flowchart LR - subgraph dev["Development (Local)"] - direction LR - A["1. Write Function
aws-durable-execution-sdk-python"] - B["2. Write Tests
aws-durable-execution-sdk-python-testing"] - C["3. Run Tests
pytest"] - end - - subgraph prod["Production (AWS)"] - direction LR - D["4. Deploy
SAM/CDK/Terraform"] - E["5. Test in Cloud
pytest --runner-mode=cloud"] - end - - A --> B --> C --> D --> E - - style dev fill:#e3f2fd - style prod fill:#fff3e0 -``` - -Here's how you build and test durable functions: - -### 1. Write your function (execution SDK) - -Install the execution SDK and write your Lambda handler: - -```console -pip install aws-durable-execution-sdk-python -``` - -```python -from aws_durable_execution_sdk_python import ( - DurableContext, - durable_execution, - durable_step, -) - -@durable_step -def my_step(step_context, data): - # Your business logic - return result - -@durable_execution -def handler(event, context: DurableContext): - result = context.step(my_step(event["data"])) - return result -``` - -### 2. Test locally (testing SDK) - -Install the testing SDK and write tests: - -```console -pip install aws-durable-execution-sdk-python-testing -``` - -```python -import pytest -from aws_durable_execution_sdk_python.execution import InvocationStatus -from my_function import handler - -@pytest.mark.durable_execution(handler=handler, lambda_function_name="my_function") -def test_my_function(durable_runner): - with durable_runner: - result = durable_runner.run(input={"data": "test"}, timeout=10) - assert result.status == InvocationStatus.SUCCEEDED -``` - -Run tests without AWS credentials: - -```console -pytest test_my_function.py -``` - -### 3. Deploy to Lambda - -Package your function with the execution SDK (not the testing SDK) and deploy using your preferred tool (SAM, CDK, Terraform, etc.). - -### 4. Test in the cloud (optional) - -Run the same tests against your deployed function: - -```console -export AWS_REGION=us-west-2 -export QUALIFIED_FUNCTION_NAME="MyFunction:$LATEST" -export LAMBDA_FUNCTION_TEST_NAME="my_function" - -pytest --runner-mode=cloud test_my_function.py -``` - -[↑ Back to top](#table-of-contents) - -## Quick start - -Ready to build your first durable function? Here's a minimal example: - -```python -from aws_durable_execution_sdk_python import ( - DurableContext, - durable_execution, - durable_step, - StepContext, -) - -@durable_step -def greet_user(step_context: StepContext, name: str) -> str: - """Generate a greeting.""" - return f"Hello {name}!" - -@durable_execution -def handler(event: dict, context: DurableContext) -> str: - """Simple durable function.""" - name = event.get("name", "World") - greeting = context.step(greet_user(name)) - return greeting -``` - -Deploy this to Lambda and you have a durable function. The `greet_user` step is checkpointed automatically. - -### Using a custom boto3 Lambda client - -If you need to customize the boto3 Lambda client used for durable execution operations (for example, to configure custom endpoints, retry settings, or credentials), you can pass a `boto3_client` parameter to the decorator. The client must be a boto3 Lambda client: - -```python -import boto3 -from botocore.config import Config -from aws_durable_execution_sdk_python import durable_execution, DurableContext - -# Create a custom boto3 Lambda client with specific configuration -custom_lambda_client = boto3.client( - 'lambda', - config=Config( - retries={'max_attempts': 5, 'mode': 'adaptive'}, - connect_timeout=10, - read_timeout=60, - ) -) - -@durable_execution(boto3_client=custom_lambda_client) -def handler(event: dict, context: DurableContext) -> dict: - # Your durable function logic - return {"status": "success"} -``` - -The custom Lambda client is used for all checkpoint and state management operations. If you don't provide a `boto3_client`, the SDK initializes a default Lambda client from your environment. - -[↑ Back to top](#table-of-contents) - -## Next steps - -Now that you've built your first durable function, explore the core features: - -**Learn the operations:** -- [Steps](core/steps.md) - Execute code with retry strategies and checkpointing -- [Wait operations](core/wait.md) - Pause execution for seconds, minutes, or hours -- [Callbacks](core/callbacks.md) - Wait for external systems to respond -- [Child contexts](core/child-contexts.md) - Organize complex workflows -- [Parallel operations](core/parallel.md) - Run multiple operations concurrently -- [Map operations](core/map.md) - Process collections in parallel - -**Dive deeper:** -- [Error handling](advanced/error-handling.md) - Handle failures and implement retry strategies -- [Testing patterns](testing-patterns/basic-tests.md) - Write effective tests for your workflows -- [Best practices](best-practices.md) - Avoid common pitfalls - -[↑ Back to top](#table-of-contents) - -## See also - -- [Documentation index](index.md) - Browse all guides and examples -- [Architecture diagrams](architecture.md) - Class diagrams and concurrency flows -- [Logger integration](core/logger.md) - Replay-safe structured logging -- [Examples directory](https://github.com/awslabs/aws-durable-execution-sdk-python/tree/main/examples) - More working examples - -[↑ Back to top](#table-of-contents) - -## License - -See the [LICENSE](../LICENSE) file for our project's licensing. - -[↑ Back to top](#table-of-contents) diff --git a/docs/index.md b/docs/index.md deleted file mode 100644 index 60109dfb..00000000 --- a/docs/index.md +++ /dev/null @@ -1,205 +0,0 @@ -# AWS Durable Execution SDK for Python - -> **Using JavaScript or TypeScript?** Check out the [AWS Durable Execution SDK for JavaScript](https://github.com/aws/aws-durable-execution-sdk-js) instead. - -## Table of Contents - -- [What is the Durable Execution SDK?](#what-is-the-durable-execution-sdk) -- [Key features](#key-features) -- [Installation](#installation) -- [Quick example](#quick-example) -- [Core concepts](#core-concepts) -- [Architecture](#architecture) -- [Use cases](#use-cases) -- [Getting help](#getting-help) -- [License](#license) - -## What is the Durable Execution SDK? - -The AWS Durable Execution SDK for Python lets you build reliable, long-running workflows in AWS Lambda. Your functions can pause execution, wait for external events, retry failed operations, and resume exactly where they left offβ€”even if Lambda recycles your execution environment. - -The SDK provides a `DurableContext` that gives you operations like steps, waits, callbacks, and parallel execution. Each operation is checkpointed automatically, so your workflow state is preserved across interruptions. - -[↑ Back to top](#table-of-contents) - -## Key features - -- **Automatic checkpointing** - Your workflow state is saved automatically after each operation -- **Durable steps** - Execute code with configurable retry strategies and at-most-once or at-least-once semantics -- **Wait operations** - Pause execution for seconds, minutes, or hours without blocking Lambda resources -- **Callbacks** - Wait for external systems to respond with results or approvals -- **Parallel execution** - Run multiple operations concurrently with configurable completion criteria -- **Map operations** - Process collections in parallel with batching and failure tolerance -- **Child contexts** - Isolate nested workflows for better organization and error handling -- **Structured logging** - Integrate with your logger to track execution flow and debug issues - -[↑ Back to top](#table-of-contents) - -## Installation - -Install the SDK using pip: - -```console -pip install aws-durable-execution-sdk-python -``` - -[↑ Back to top](#table-of-contents) - -## Quick example - -Here's a simple durable function that processes an order: - -```python -from aws_durable_execution_sdk_python import ( - DurableContext, - durable_execution, - durable_step, -) -from aws_durable_execution_sdk_python.config import Duration - -@durable_step -def validate_order(order_id: str) -> dict: - # Validation logic here - return {"order_id": order_id, "valid": True} - -@durable_step -def charge_payment(order_id: str, amount: float) -> dict: - # Payment processing logic here - return {"transaction_id": "txn_123", "status": "completed"} - -@durable_step -def fulfill_order(order_id: str) -> dict: - # Fulfillment logic here - return {"tracking_number": "TRK123456"} - -@durable_execution -def process_order(event: dict, context: DurableContext) -> dict: - order_id = event["order_id"] - amount = event["amount"] - - # Step 1: Validate the order - validation = context.step(validate_order(order_id)) - - if not validation["valid"]: - return {"status": "failed", "reason": "invalid_order"} - - # Step 2: Charge payment - payment = context.step(charge_payment(order_id, amount)) - - # Step 3: Wait for payment confirmation (simulated) - context.wait(Duration.from_seconds(5)) - - # Step 4: Fulfill the order - fulfillment = context.step(fulfill_order(order_id)) - - return { - "status": "completed", - "order_id": order_id, - "transaction_id": payment["transaction_id"], - "tracking_number": fulfillment["tracking_number"] - } -``` - -Each `context.step()` call is checkpointed automatically. If Lambda recycles your execution environment, the function resumes from the last completed step. - -[↑ Back to top](#table-of-contents) - -## Core concepts - -### Durable functions - -A durable function is a Lambda function decorated with `@durable_execution` that can be checkpointed and resumed. The function receives a `DurableContext` that provides methods for durable operations. - -### Operations - -Operations are units of work in a durable execution. Each operation type serves a specific purpose: - -- **Steps** - Execute code and checkpoint the result with retry support -- **Waits** - Pause execution for a specified duration without blocking Lambda -- **Callbacks** - Wait for external systems to respond with results -- **Invoke** - Call other durable functions to compose complex workflows -- **Child contexts** - Isolate nested workflows for better organization -- **Parallel** - Execute multiple operations concurrently with completion criteria -- **Map** - Process collections in parallel with batching and failure tolerance - -### Checkpoints - -Checkpoints are saved states of execution that allow resumption. When your function calls `context.step()` or other operations, the SDK creates a checkpoint and sends it to AWS. If Lambda recycles your environment or your function waits for an external event, execution can resume from the last checkpoint. - -### Replay - -When your function resumes, completed operations don't re-execute. Instead, they return their checkpointed results instantly. This means your function code runs multiple times, but side effects only happen once per operation. - -### Decorators - -The SDK provides decorators to mark functions as durable: - -- `@durable_execution` - Marks your Lambda handler as a durable function -- `@durable_step` - Marks a function that can be used with `context.step()` -- `@durable_with_child_context` - Marks a function that receives a child context - -[↑ Back to top](#table-of-contents) - -## Architecture - -The SDK integrates with AWS Lambda's durable execution service to provide reliable, long-running workflows. Here's how it works: - -1. **Execution starts** - Lambda invokes your function with a `DurableContext` -2. **Operations checkpoint** - Each `context.step()`, `context.wait()`, or other operation creates a checkpoint -3. **State is saved** - Checkpoints are sent to the durable execution service and persisted -4. **Execution may pause** - Lambda can recycle your environment or wait for external events -5. **Execution resumes** - When ready, Lambda invokes your function again with the saved state -6. **Operations replay** - Completed operations return their saved results instantly -7. **New operations execute** - Your function continues from where it left off - -### Key components - -- **DurableContext** - Main interface for durable operations, provided by Lambda -- **ExecutionState** - Manages checkpoints and tracks operation results -- **Operation handlers** - Execute steps, waits, callbacks, and other operations -- **Checkpoint batching** - Groups multiple checkpoints into efficient API calls -- **SerDes system** - Serializes and deserializes operation inputs and results - -### Checkpointing - -The SDK uses a background thread to batch checkpoints for efficiency. Critical operations (like step starts with at-most-once semantics) block until the checkpoint is confirmed. Non-critical operations (like observability checkpoints) are asynchronous for better performance - -[**See architecture diagrams**](architecture.md) for class diagrams and concurrency flows. - -[↑ Back to top](#table-of-contents) - -## Use cases - -The SDK helps you build: - -**Order processing workflows** - Validate orders, charge payments, and fulfill shipments with automatic retry on failures. - -**Approval workflows** - Wait for human approvals or external system responses using callbacks. - -**Data processing pipelines** - Process large datasets in parallel with map operations and failure tolerance. - -**Multi-step integrations** - Coordinate calls to multiple services with proper error handling and state management. - -**Long-running tasks** - Execute workflows that take minutes or hours without blocking Lambda resources. - -**Saga patterns** - Implement distributed transactions with compensation logic for failures. - -[↑ Back to top](#table-of-contents) - -## Getting help - -**Documentation** - You're reading it! Use the navigation above to find specific topics. - -**Examples** - Check the `examples/` directory in the repository for working code samples. - -**Issues** - Report bugs or request features on the [GitHub repository](https://github.com/awslabs/aws-durable-execution-sdk-python). - -**Contributing** - See [CONTRIBUTING.md](../CONTRIBUTING.md) for guidelines on contributing to the project. - -[↑ Back to top](#table-of-contents) - -## License - -See the [LICENSE](../LICENSE) file for our project's licensing. - -[↑ Back to top](#table-of-contents) diff --git a/docs/testing-patterns/.gitkeep b/docs/testing-patterns/.gitkeep deleted file mode 100644 index 97481357..00000000 --- a/docs/testing-patterns/.gitkeep +++ /dev/null @@ -1 +0,0 @@ -# This file will be removed once the directory has content diff --git a/docs/testing-patterns/basic-tests.md b/docs/testing-patterns/basic-tests.md deleted file mode 100644 index fc099081..00000000 --- a/docs/testing-patterns/basic-tests.md +++ /dev/null @@ -1,701 +0,0 @@ -# Basic Test Patterns - -## Table of Contents - -- [Overview](#overview) -- [Prerequisites](#prerequisites) -- [Project structure](#project-structure) -- [Getting started](#getting-started) -- [Status checking patterns](#status-checking-patterns) -- [Result verification patterns](#result-verification-patterns) -- [Operation-specific assertions](#operation-specific-assertions) -- [Test organization tips](#test-organization-tips) -- [FAQ](#faq) -- [See also](#see-also) - -[← Back to main index](../index.md) - -## Overview - -When you test durable functions, you need to verify that your function executed successfully, returned the expected result, and that operations like steps or waits ran correctly. This document shows you common patterns for writing these tests with simple assertions using the testing SDK. - -The testing SDK (`aws-durable-execution-sdk-python-testing`) provides tools to run and inspect durable functions locally without deploying to AWS. Use these patterns as building blocks for your own tests, whether you're checking a simple calculation or inspecting individual operations. - -[↑ Back to top](#table-of-contents) - -## Prerequisites - -To test durable functions, you need both SDKs installed: - -```console -# Install the core SDK (for writing durable functions) -pip install aws-durable-execution-sdk-python - -# Install the testing SDK (for testing durable functions) -pip install aws-durable-execution-sdk-python-testing - -# Install pytest (test framework) -pip install pytest -``` - -The core SDK provides the decorators and context for writing durable functions. The testing SDK provides the test runner and assertions for testing them. - -[↑ Back to top](#table-of-contents) - -## Project structure - -Here's a typical project structure for testing durable functions: - -``` -my-project/ -β”œβ”€β”€ src/ -β”‚ β”œβ”€β”€ __init__.py -β”‚ └── my_function.py # Your durable function -β”œβ”€β”€ test/ -β”‚ β”œβ”€β”€ __init__.py -β”‚ β”œβ”€β”€ conftest.py # Pytest configuration and fixtures -β”‚ └── test_my_function.py # Your tests -β”œβ”€β”€ requirements.txt -└── pytest.ini -``` - -**Key files:** - -- `src/my_function.py` - Contains your durable function with `@durable_execution` decorator -- `test/conftest.py` - Configures the `durable_runner` fixture for pytest -- `test/test_my_function.py` - Contains your test cases using the `durable_runner` fixture - -**Example conftest.py:** - -```python -import pytest -from aws_durable_execution_sdk_python_testing.runner import DurableFunctionTestRunner - -@pytest.fixture -def durable_runner(request): - """Pytest fixture that provides a test runner.""" - marker = request.node.get_closest_marker("durable_execution") - if not marker: - pytest.fail("Test must be marked with @pytest.mark.durable_execution") - - handler = marker.kwargs.get("handler") - runner = DurableFunctionTestRunner(handler=handler) - - yield runner -``` - -[↑ Back to top](#table-of-contents) - -## Getting started - -Here's a simple durable function: - -```python -from aws_durable_execution_sdk_python import DurableContext, durable_execution - -@durable_execution -def handler(event: dict, context: DurableContext) -> str: - """Simple hello world durable function.""" - return "Hello World!" -``` - -And here's how you test it: - -```python -import pytest -from aws_durable_execution_sdk_python.execution import InvocationStatus -from test.conftest import deserialize_operation_payload - -@pytest.mark.durable_execution( - handler=handler, - lambda_function_name="hello world", -) -def test_hello_world(durable_runner): - """Test hello world example.""" - with durable_runner: - result = durable_runner.run(input="test", timeout=10) - - assert result.status is InvocationStatus.SUCCEEDED - assert deserialize_operation_payload(result.result) == "Hello World!" -``` - -This test: -1. Marks the test with `@pytest.mark.durable_execution` to configure the runner -2. Uses the `durable_runner` fixture to execute the function -3. Checks the execution status -4. Verifies the final result - -[↑ Back to top](#table-of-contents) - -## Status checking patterns - -### Check for successful execution - -The most basic pattern verifies that your function completed successfully: - -```python -@pytest.mark.durable_execution( - handler=my_handler, - lambda_function_name="my_function", -) -def test_success(durable_runner): - """Test successful execution.""" - with durable_runner: - result = durable_runner.run(input={"data": "test"}, timeout=10) - - assert result.status is InvocationStatus.SUCCEEDED -``` - -### Check for expected failures - -Test that your function fails correctly when given invalid input: - -```python -@pytest.mark.durable_execution( - handler=handler_with_validation, - lambda_function_name="validation_function", -) -def test_validation_failure(durable_runner): - """Test that invalid input causes failure.""" - with durable_runner: - result = durable_runner.run(input={"invalid": "data"}, timeout=10) - - assert result.status is InvocationStatus.FAILED - assert "ValidationError" in str(result.error) -``` - -### Check execution with timeout - -Verify that your function completes within the expected time: - -```python -@pytest.mark.durable_execution( - handler=quick_handler, - lambda_function_name="quick_function", -) -def test_completes_quickly(durable_runner): - """Test that function completes within timeout.""" - with durable_runner: - # Use a short timeout to verify quick execution - result = durable_runner.run(input={}, timeout=5) - - assert result.status is InvocationStatus.SUCCEEDED -``` - -[↑ Back to top](#table-of-contents) - -## Result verification patterns - -### Verify simple return values - -Check that your function returns the expected value: - -```python -from test.conftest import deserialize_operation_payload - -@pytest.mark.durable_execution( - handler=calculator_handler, - lambda_function_name="calculator", -) -def test_calculation_result(durable_runner): - """Test calculation returns correct result.""" - with durable_runner: - result = durable_runner.run(input={"a": 5, "b": 3}, timeout=10) - - assert result.status is InvocationStatus.SUCCEEDED - assert deserialize_operation_payload(result.result) == 8 -``` - -### Verify complex return values - -Check specific fields in complex return values: - -```python -@pytest.mark.durable_execution( - handler=order_handler, - lambda_function_name="order_processor", -) -def test_order_processing(durable_runner): - """Test order processing returns correct structure.""" - with durable_runner: - result = durable_runner.run( - input={"order_id": "order-123", "amount": 100.0}, - timeout=10 - ) - - assert result.status is InvocationStatus.SUCCEEDED - - order_result = deserialize_operation_payload(result.result) - assert order_result["order_id"] == "order-123" - assert order_result["status"] == "completed" - assert order_result["amount"] == 100.0 -``` - -### Verify list results - -Check that your function returns the expected list of values: - -```python -@pytest.mark.durable_execution( - handler=parallel_handler, - lambda_function_name="parallel_tasks", -) -def test_parallel_results(durable_runner): - """Test parallel operations return all results.""" - with durable_runner: - result = durable_runner.run(input={}, timeout=10) - - assert result.status is InvocationStatus.SUCCEEDED - - results = deserialize_operation_payload(result.result) - assert len(results) == 3 - assert results == [ - "Task 1 complete", - "Task 2 complete", - "Task 3 complete", - ] -``` - -[↑ Back to top](#table-of-contents) - -## Operation-specific assertions - -### Verify step operations - -Here's a function with a step: - -```python -from aws_durable_execution_sdk_python import ( - DurableContext, - durable_execution, - durable_step, - StepContext, -) - -@durable_step -def add_numbers(step_context: StepContext, a: int, b: int) -> int: - return a + b - -@durable_execution -def handler(event: dict, context: DurableContext) -> int: - result = context.step(add_numbers(5, 3)) - return result -``` - -Check that the step executed and produced the expected result: - -```python -import pytest -from aws_durable_execution_sdk_python.execution import InvocationStatus -from test.conftest import deserialize_operation_payload - -@pytest.mark.durable_execution( - handler=handler, - lambda_function_name="step_function", -) -def test_step_execution(durable_runner): - """Test step executes correctly.""" - with durable_runner: - result = durable_runner.run(input={}, timeout=10) - - assert result.status is InvocationStatus.SUCCEEDED - - # Get step by name - step_result = result.get_step("add_numbers") - assert deserialize_operation_payload(step_result.result) == 8 -``` - -### Verify wait operations - -Here's a function with a wait: - -```python -from aws_durable_execution_sdk_python import DurableContext, durable_execution -from aws_durable_execution_sdk_python.config import Duration - -@durable_execution -def handler(event: dict, context: DurableContext) -> str: - context.wait(Duration.from_seconds(5)) - return "Wait completed" -``` - -Check that the wait operation was created with correct timing: - -```python -@pytest.mark.durable_execution( - handler=handler, - lambda_function_name="wait_function", -) -def test_wait_operation(durable_runner): - """Test wait operation is created.""" - with durable_runner: - result = durable_runner.run(input={}, timeout=10) - - assert result.status is InvocationStatus.SUCCEEDED - - # Find wait operations - wait_ops = [ - op for op in result.operations - if op.operation_type.value == "WAIT" - ] - assert len(wait_ops) == 1 - assert wait_ops[0].scheduled_end_timestamp is not None -``` - -### Verify callback operations - -Here's a function that creates a callback: - -```python -from aws_durable_execution_sdk_python import DurableContext, durable_execution -from aws_durable_execution_sdk_python.config import CallbackConfig - -@durable_execution -def handler(event: dict, context: DurableContext) -> str: - callback_config = CallbackConfig( - timeout_seconds=120, - heartbeat_timeout_seconds=60 - ) - - callback = context.create_callback( - name="example_callback", - config=callback_config - ) - - return f"Callback created with ID: {callback.callback_id}" -``` - -Check that the callback was created with correct configuration: - -```python -@pytest.mark.durable_execution( - handler=handler, - lambda_function_name="callback_function", -) -def test_callback_creation(durable_runner): - """Test callback is created correctly.""" - with durable_runner: - result = durable_runner.run(input={}, timeout=10) - - assert result.status is InvocationStatus.SUCCEEDED - - # Find callback operations - callback_ops = [ - op for op in result.operations - if op.operation_type.value == "CALLBACK" - ] - assert len(callback_ops) == 1 - - callback_op = callback_ops[0] - assert callback_op.name == "example_callback" - assert callback_op.callback_id is not None -``` - -### Verify child context operations - -Here's a function with a child context: - -```python -from aws_durable_execution_sdk_python import ( - DurableContext, - durable_execution, - durable_with_child_context, -) - -@durable_with_child_context -def child_operation(ctx: DurableContext, value: int) -> int: - return ctx.step(lambda _: value * 2, name="multiply") - -@durable_execution -def handler(event: dict, context: DurableContext) -> str: - result = context.run_in_child_context(child_operation(5)) - return f"Child context result: {result}" -``` - -Check that the child context executed correctly: - -```python -@pytest.mark.durable_execution( - handler=handler, - lambda_function_name="child_context_function", -) -def test_child_context(durable_runner): - """Test child context executes.""" - with durable_runner: - result = durable_runner.run(input={}, timeout=10) - - assert result.status is InvocationStatus.SUCCEEDED - - # Find child context operations - context_ops = [ - op for op in result.operations - if op.operation_type.value == "CONTEXT" - ] - assert len(context_ops) >= 1 -``` - -### Verify parallel operations - -Here's a function with parallel operations: - -```python -from aws_durable_execution_sdk_python import DurableContext, durable_execution - -@durable_execution -def handler(event: dict, context: DurableContext) -> list[str]: - # Execute multiple operations - task1 = context.step(lambda _: "Task 1 complete", name="task1") - task2 = context.step(lambda _: "Task 2 complete", name="task2") - task3 = context.step(lambda _: "Task 3 complete", name="task3") - - # All tasks execute concurrently and results are collected - return [task1, task2, task3] -``` - -Check that multiple operations executed in parallel: - -```python -from aws_durable_execution_sdk_python.lambda_service import OperationType - -@pytest.mark.durable_execution( - handler=handler, - lambda_function_name="parallel_function", -) -def test_parallel_operations(durable_runner): - """Test parallel operations execute.""" - with durable_runner: - result = durable_runner.run(input={}, timeout=10) - - assert result.status is InvocationStatus.SUCCEEDED - - # Find all step operations - step_ops = [ - op for op in result.operations - if op.operation_type == OperationType.STEP - ] - assert len(step_ops) == 3 - - # Verify step names - step_names = {op.name for op in step_ops} - assert step_names == {"task1", "task2", "task3"} -``` - -[↑ Back to top](#table-of-contents) - -## Test organization tips - -### Use descriptive test names - -Name your tests to clearly describe what they verify: - -```python -# Good - describes what is being tested -def test_order_processing_succeeds_with_valid_input(durable_runner): - pass - -def test_order_processing_fails_with_invalid_order_id(durable_runner): - pass - -# Avoid - vague or unclear -def test_order(durable_runner): - pass - -def test_case_1(durable_runner): - pass -``` - -### Group related tests - -Organize tests by feature or functionality: - -```python -# tests/test_order_processing.py -class TestOrderValidation: - """Tests for order validation.""" - - @pytest.mark.durable_execution(handler=handler, lambda_function_name="orders") - def test_valid_order(self, durable_runner): - """Test valid order is accepted.""" - pass - - @pytest.mark.durable_execution(handler=handler, lambda_function_name="orders") - def test_invalid_order_id(self, durable_runner): - """Test invalid order ID is rejected.""" - pass - -class TestOrderFulfillment: - """Tests for order fulfillment.""" - - @pytest.mark.durable_execution(handler=handler, lambda_function_name="orders") - def test_fulfillment_success(self, durable_runner): - """Test successful order fulfillment.""" - pass -``` - -### Use fixtures for common test data - -Create fixtures for test data you use across multiple tests: - -```python -# conftest.py -@pytest.fixture -def valid_order(): - """Provide valid order data.""" - return { - "order_id": "order-123", - "customer_id": "customer-456", - "amount": 100.0, - "items": [ - {"product_id": "prod-1", "quantity": 2}, - {"product_id": "prod-2", "quantity": 1}, - ], - } - -# test_orders.py -@pytest.mark.durable_execution(handler=handler, lambda_function_name="orders") -def test_order_processing(durable_runner, valid_order): - """Test order processing with valid data.""" - with durable_runner: - result = durable_runner.run(input=valid_order, timeout=10) - - assert result.status is InvocationStatus.SUCCEEDED -``` - -### Add docstrings to tests - -Document what each test verifies: - -```python -@pytest.mark.durable_execution(handler=handler, lambda_function_name="payment") -def test_payment_with_retry(durable_runner): - """Test payment processing retries on transient failures. - - This test verifies that: - 1. Payment step retries on RuntimeError - 2. Function eventually succeeds after retries - 3. Final result includes transaction ID - """ - with durable_runner: - result = durable_runner.run(input={"amount": 50.0}, timeout=30) - - assert result.status is InvocationStatus.SUCCEEDED -``` - -### Use parametrized tests for similar cases - -Test multiple inputs with the same logic using `pytest.mark.parametrize`: - -```python -@pytest.mark.parametrize("a,b,expected", [ - (5, 3, 8), - (10, 20, 30), - (0, 0, 0), - (-5, 5, 0), -]) -@pytest.mark.durable_execution(handler=add_handler, lambda_function_name="calculator") -def test_addition(durable_runner, a, b, expected): - """Test addition with various inputs.""" - with durable_runner: - result = durable_runner.run(input={"a": a, "b": b}, timeout=10) - - assert result.status is InvocationStatus.SUCCEEDED - assert deserialize_operation_payload(result.result) == expected -``` - -### Keep tests focused - -Each test should verify one specific behavior: - -```python -# Good - focused on one behavior -@pytest.mark.durable_execution(handler=handler, lambda_function_name="orders") -def test_order_validation_succeeds(durable_runner): - """Test order validation with valid input.""" - with durable_runner: - result = durable_runner.run(input={"order_id": "order-123"}, timeout=10) - - assert result.status is InvocationStatus.SUCCEEDED - -@pytest.mark.durable_execution(handler=handler, lambda_function_name="orders") -def test_order_validation_fails_missing_id(durable_runner): - """Test order validation fails without order ID.""" - with durable_runner: - result = durable_runner.run(input={}, timeout=10) - - assert result.status is InvocationStatus.FAILED - -# Avoid - testing multiple behaviors -@pytest.mark.durable_execution(handler=handler, lambda_function_name="orders") -def test_order_validation(durable_runner): - """Test order validation.""" - # Test valid input - result1 = durable_runner.run(input={"order_id": "order-123"}, timeout=10) - assert result1.status is InvocationStatus.SUCCEEDED - - # Test invalid input - result2 = durable_runner.run(input={}, timeout=10) - assert result2.status is InvocationStatus.FAILED -``` - -[↑ Back to top](#table-of-contents) - -## FAQ - -**Q: Do I need to deploy my function to test it?** - -A: No, the test runner executes your function locally. You only need to deploy for cloud testing mode. - -**Q: How do I test functions with external dependencies?** - -A: Mock external dependencies in your test setup. The test runner executes your function code as-is, so standard Python mocking works. - -**Q: Can I test multiple functions in one test file?** - -A: Yes, use different `@pytest.mark.durable_execution` markers for each function you want to test. - -**Q: How do I access operation results?** - -A: Use `result.get_step(name)` for steps, or iterate through `result.operations` to find specific operation types. - -**Q: What's the difference between result.result and step.result?** - -A: `result.result` is the final return value of your handler function. `step.result` is the return value of a specific step operation. - -**Q: How do I test error scenarios?** - -A: Check that `result.status is InvocationStatus.FAILED` and inspect `result.error` for the error message. - -**Q: Can I run tests in parallel?** - -A: Yes, use pytest-xdist: `pytest -n auto` to run tests in parallel. - -**Q: How do I debug failing tests?** - -A: Add print statements or use a debugger. The test runner executes your code locally, so standard debugging tools work. - -**Q: What timeout should I use?** - -A: Use a timeout slightly longer than your function's expected execution time. For most tests, 10-30 seconds is sufficient. - -**Q: How do I test functions that use environment variables?** - -A: Set environment variables in your test setup or use pytest fixtures to manage them. - -[↑ Back to top](#table-of-contents) - -## See also - -- [Complex workflows](complex-workflows.md) - Testing multi-step workflows -- [Best practices](../best-practices.md) - Testing recommendations -- [Testing modes](../advanced/testing-modes.md) - Local and cloud test execution -- [Steps](../core/steps.md) - Testing step operations -- [Wait operations](../core/wait.md) - Testing wait operations -- [Callbacks](../core/callbacks.md) - Testing callback operations - -[↑ Back to top](#table-of-contents) - -## License - -See the [LICENSE](../../LICENSE) file for our project's licensing. - -[↑ Back to top](#table-of-contents) diff --git a/docs/testing-patterns/complex-workflows.md b/docs/testing-patterns/complex-workflows.md deleted file mode 100644 index 6781b06f..00000000 --- a/docs/testing-patterns/complex-workflows.md +++ /dev/null @@ -1,675 +0,0 @@ -# Complex Workflow Testing - -## Table of Contents - -- [Overview](#overview) -- [Prerequisites](#prerequisites) -- [Multi-step workflows](#multi-step-workflows) -- [Nested child contexts](#nested-child-contexts) -- [Parallel operations](#parallel-operations) -- [Error scenarios](#error-scenarios) -- [Timeout handling](#timeout-handling) -- [Polling patterns](#polling-patterns) -- [FAQ](#faq) -- [See also](#see-also) - -[← Back to main index](../index.md) - -## Overview - -When your workflows involve multiple steps, nested contexts, or parallel operations, you need to verify more than just the final result. You'll want to check intermediate states, operation ordering, error handling, and timeout behavior. - -This guide shows you how to test workflows that chain operations together, handle errors gracefully, and implement polling patterns. - -[↑ Back to top](#table-of-contents) - -## Prerequisites - -You need both SDKs installed: - -```console -pip install aws-durable-execution-sdk-python -pip install aws-durable-execution-sdk-python-testing -pip install pytest -``` - -If you're new to testing durable functions, start with [Basic test patterns](basic-tests.md) first. - -[↑ Back to top](#table-of-contents) - -## Multi-step workflows - -### Sequential operations - - -Here's a workflow that processes an order through validation, payment, and fulfillment: - -```python -from aws_durable_execution_sdk_python import ( - DurableContext, - durable_execution, - durable_step, - StepContext, -) - -@durable_step -def validate_order(step_context: StepContext, order_id: str) -> dict: - return {"order_id": order_id, "status": "validated"} - -@durable_step -def process_payment(step_context: StepContext, order: dict) -> dict: - return {**order, "payment_status": "completed"} - -@durable_step -def fulfill_order(step_context: StepContext, order: dict) -> dict: - return {**order, "fulfillment_status": "shipped"} - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - order_id = event["order_id"] - - validated = context.step(validate_order(order_id), name="validate") - paid = context.step(process_payment(validated), name="payment") - fulfilled = context.step(fulfill_order(paid), name="fulfillment") - - return fulfilled -``` - -Verify all steps execute in order: - -```python -import pytest -from aws_durable_execution_sdk_python.execution import InvocationStatus -from aws_durable_execution_sdk_python.lambda_service import OperationType -from test.conftest import deserialize_operation_payload - -@pytest.mark.durable_execution(handler=handler, lambda_function_name="order_workflow") -def test_order_workflow(durable_runner): - """Test order processing executes all steps.""" - with durable_runner: - result = durable_runner.run(input={"order_id": "order-123"}, timeout=30) - - assert result.status is InvocationStatus.SUCCEEDED - - # Check final result - final_result = deserialize_operation_payload(result.result) - assert final_result["order_id"] == "order-123" - assert final_result["payment_status"] == "completed" - assert final_result["fulfillment_status"] == "shipped" - - # Verify all three steps ran - step_ops = [op for op in result.operations if op.operation_type == OperationType.STEP] - assert len(step_ops) == 3 - - # Check step order - step_names = [op.name for op in step_ops] - assert step_names == ["validate", "payment", "fulfillment"] -``` - -[↑ Back to top](#table-of-contents) - -### Conditional branching - -Test different execution paths based on input: - -```python -@durable_execution -def handler(event: dict, context: DurableContext) -> str: - amount = event.get("amount", 0) - - context.step(lambda _: amount, name="validate_amount") - - if amount > 1000: - context.step(lambda _: "Manager approval required", name="approval") - context.wait(Duration.from_seconds(10), name="approval_wait") - result = context.step(lambda _: "High-value order processed", name="process_high") - else: - result = context.step(lambda _: "Standard order processed", name="process_standard") - - return result -``` - -Test both paths separately: - -```python -@pytest.mark.durable_execution(handler=handler, lambda_function_name="conditional_workflow") -def test_high_value_path(durable_runner): - """Test high-value orders require approval.""" - with durable_runner: - result = durable_runner.run(input={"amount": 1500}, timeout=30) - - assert result.status is InvocationStatus.SUCCEEDED - assert deserialize_operation_payload(result.result) == "High-value order processed" - - # Verify approval step exists - approval_step = result.get_step("approval") - assert approval_step is not None - -@pytest.mark.durable_execution(handler=handler, lambda_function_name="conditional_workflow") -def test_standard_path(durable_runner): - """Test standard orders skip approval.""" - with durable_runner: - result = durable_runner.run(input={"amount": 500}, timeout=30) - - assert result.status is InvocationStatus.SUCCEEDED - - # Verify no approval step - step_names = [op.name for op in result.operations if op.operation_type == OperationType.STEP] - assert "approval" not in step_names -``` - -[↑ Back to top](#table-of-contents) - -## Nested child contexts - - -### Single child context - -Child contexts isolate operations: - -```python -from aws_durable_execution_sdk_python import ( - DurableContext, - durable_execution, - durable_with_child_context, -) - -@durable_with_child_context -def process_item(ctx: DurableContext, item_id: str) -> dict: - ctx.step(lambda _: f"Validating {item_id}", name="validate") - result = ctx.step( - lambda _: {"item_id": item_id, "status": "processed"}, - name="process" - ) - return result - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - item_id = event["item_id"] - result = context.run_in_child_context( - process_item(item_id), - name="item_processing" - ) - return result -``` - -Verify the child context executes: - -```python -@pytest.mark.durable_execution(handler=handler, lambda_function_name="child_context_workflow") -def test_child_context(durable_runner): - """Test child context execution.""" - with durable_runner: - result = durable_runner.run(input={"item_id": "item-123"}, timeout=30) - - assert result.status is InvocationStatus.SUCCEEDED - - # Check child context ran - context_ops = [op for op in result.operations if op.operation_type.value == "CONTEXT"] - assert len(context_ops) == 1 - assert context_ops[0].name == "item_processing" - - # Check child context result - child_result = result.get_context("item_processing") - child_data = deserialize_operation_payload(child_result.result) - assert child_data["item_id"] == "item-123" -``` - -[↑ Back to top](#table-of-contents) - -### Multiple child contexts - -Use multiple child contexts to organize operations: - -```python -@durable_with_child_context -def validate_data(ctx: DurableContext, data: dict) -> dict: - return ctx.step(lambda _: {**data, "validated": True}, name="validate") - -@durable_with_child_context -def transform_data(ctx: DurableContext, data: dict) -> dict: - return ctx.step(lambda _: {**data, "transformed": True}, name="transform") - -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - data = event["data"] - - validated = context.run_in_child_context(validate_data(data), name="validation") - transformed = context.run_in_child_context(transform_data(validated), name="transformation") - - return transformed -``` - -Verify both contexts execute: - -```python -@pytest.mark.durable_execution(handler=handler, lambda_function_name="multiple_contexts") -def test_multiple_child_contexts(durable_runner): - """Test multiple child contexts.""" - with durable_runner: - result = durable_runner.run(input={"data": {"value": 42}}, timeout=30) - - assert result.status is InvocationStatus.SUCCEEDED - - final_result = deserialize_operation_payload(result.result) - assert final_result["validated"] is True - assert final_result["transformed"] is True - - # Verify both contexts ran - context_ops = [op for op in result.operations if op.operation_type.value == "CONTEXT"] - assert len(context_ops) == 2 -``` - -[↑ Back to top](#table-of-contents) - -## Parallel operations - -### Basic parallel execution - -Multiple operations execute concurrently: - -```python -@durable_execution -def handler(event: dict, context: DurableContext) -> list[str]: - task1 = context.step(lambda _: "Task 1 complete", name="task1") - task2 = context.step(lambda _: "Task 2 complete", name="task2") - task3 = context.step(lambda _: "Task 3 complete", name="task3") - - return [task1, task2, task3] -``` - -Verify all operations execute: - -```python -@pytest.mark.durable_execution(handler=handler, lambda_function_name="parallel_ops") -def test_parallel_operations(durable_runner): - """Test parallel execution.""" - with durable_runner: - result = durable_runner.run(input={}, timeout=30) - - assert result.status is InvocationStatus.SUCCEEDED - - results = deserialize_operation_payload(result.result) - assert len(results) == 3 - - # Verify all steps ran - step_ops = [op for op in result.operations if op.operation_type == OperationType.STEP] - assert len(step_ops) == 3 - - step_names = {op.name for op in step_ops} - assert step_names == {"task1", "task2", "task3"} -``` - -[↑ Back to top](#table-of-contents) - -### Processing collections - - -Process collection items in parallel: - -```python -@durable_execution -def handler(event: dict, context: DurableContext) -> list[int]: - numbers = event.get("numbers", [1, 2, 3, 4, 5]) - - results = [] - for i, num in enumerate(numbers): - result = context.step(lambda _, n=num: n * 2, name=f"square_{i}") - results.append(result) - - return results -``` - -Verify collection processing: - -```python -@pytest.mark.durable_execution(handler=handler, lambda_function_name="parallel_collection") -def test_collection_processing(durable_runner): - """Test collection processing.""" - with durable_runner: - result = durable_runner.run(input={"numbers": [1, 2, 3, 4, 5]}, timeout=30) - - assert result.status is InvocationStatus.SUCCEEDED - assert deserialize_operation_payload(result.result) == [2, 4, 6, 8, 10] - - # Verify all steps ran - step_ops = [op for op in result.operations if op.operation_type == OperationType.STEP] - assert len(step_ops) == 5 -``` - -[↑ Back to top](#table-of-contents) - -## Error scenarios - -### Expected failures - -Test that your workflow fails correctly: - -```python -@durable_step -def validate_input(step_context: StepContext, value: int) -> int: - if value < 0: - raise ValueError("Value must be non-negative") - return value - -@durable_execution -def handler(event: dict, context: DurableContext) -> int: - value = event.get("value", 0) - validated = context.step(validate_input(value), name="validate") - return validated -``` - -Verify validation failures: - -```python -@pytest.mark.durable_execution(handler=handler, lambda_function_name="validation_workflow") -def test_validation_failure(durable_runner): - """Test validation fails with invalid input.""" - with durable_runner: - result = durable_runner.run(input={"value": -5}, timeout=30) - - assert result.status is InvocationStatus.FAILED - assert "Value must be non-negative" in str(result.error) -``` - -[↑ Back to top](#table-of-contents) - -### Retry behavior - -Test operations that retry on failure: - -```python -from aws_durable_execution_sdk_python.config import StepConfig -from aws_durable_execution_sdk_python.retries import ( - RetryStrategyConfig, - create_retry_strategy, -) - -attempt_count = 0 - -@durable_step -def unreliable_operation(step_context: StepContext) -> str: - global attempt_count - attempt_count += 1 - - if attempt_count < 3: - raise RuntimeError("Transient error") - - return "Operation succeeded" - -@durable_execution -def handler(event: dict, context: DurableContext) -> str: - retry_config = RetryStrategyConfig( - max_attempts=5, - retryable_error_types=[RuntimeError], - ) - - result = context.step( - unreliable_operation(), - config=StepConfig(create_retry_strategy(retry_config)), - name="unreliable" - ) - - return result -``` - -Verify retry succeeds: - -```python -@pytest.mark.durable_execution(handler=handler, lambda_function_name="retry_workflow") -def test_retry_behavior(durable_runner): - """Test operation retries on failure.""" - global attempt_count - attempt_count = 0 - - with durable_runner: - result = durable_runner.run(input={}, timeout=60) - - assert result.status is InvocationStatus.SUCCEEDED - assert deserialize_operation_payload(result.result) == "Operation succeeded" - assert attempt_count >= 3 -``` - -[↑ Back to top](#table-of-contents) - -### Partial failures - -Test workflows where some operations succeed before failure: - -```python -@durable_execution -def handler(event: dict, context: DurableContext) -> str: - context.step(lambda _: "Step 1 complete", name="step1") - context.step(lambda _: "Step 2 complete", name="step2") - context.step( - lambda _: (_ for _ in ()).throw(RuntimeError("Step 3 failed")), - name="step3" - ) - return "Should not reach here" -``` - -Verify partial execution: - -```python -@pytest.mark.durable_execution(handler=handler, lambda_function_name="partial_failure") -def test_partial_failure(durable_runner): - """Test workflow fails after some steps succeed.""" - with durable_runner: - result = durable_runner.run(input={}, timeout=30) - - assert result.status is InvocationStatus.FAILED - - # First two steps succeeded - step1 = result.get_step("step1") - assert deserialize_operation_payload(step1.result) == "Step 1 complete" - - step2 = result.get_step("step2") - assert deserialize_operation_payload(step2.result) == "Step 2 complete" - - assert "Step 3 failed" in str(result.error) -``` - -[↑ Back to top](#table-of-contents) - -## Timeout handling - -### Callback timeouts - - -Verify callback timeout configuration: - -```python -from aws_durable_execution_sdk_python.config import CallbackConfig - -@durable_execution -def handler(event: dict, context: DurableContext) -> str: - config = CallbackConfig(timeout_seconds=60, heartbeat_timeout_seconds=30) - callback = context.create_callback(name="approval_callback", config=config) - return f"Callback created: {callback.callback_id}" -``` - -Test callback configuration: - -```python -@pytest.mark.durable_execution(handler=handler, lambda_function_name="callback_timeout") -def test_callback_timeout(durable_runner): - """Test callback timeout configuration.""" - with durable_runner: - result = durable_runner.run(input={}, timeout=10) - - assert result.status is InvocationStatus.SUCCEEDED - - callback_ops = [op for op in result.operations if op.operation_type.value == "CALLBACK"] - assert len(callback_ops) == 1 - assert callback_ops[0].name == "approval_callback" -``` - -[↑ Back to top](#table-of-contents) - -### Long waits - -For workflows with long waits, verify configuration without actually waiting: - -```python -@durable_execution -def handler(event: dict, context: DurableContext) -> str: - context.step(lambda _: "Starting", name="start") - context.wait(Duration.from_seconds(3600), name="long_wait") # 1 hour - context.step(lambda _: "Continuing", name="continue") - return "Complete" -``` - -Test completes quickly: - -```python -@pytest.mark.durable_execution(handler=handler, lambda_function_name="long_wait") -def test_long_wait(durable_runner): - """Test long wait configuration.""" - with durable_runner: - result = durable_runner.run(input={}, timeout=10) - - assert result.status is InvocationStatus.SUCCEEDED - - # Verify wait exists - wait_ops = [op for op in result.operations if op.operation_type.value == "WAIT"] - assert len(wait_ops) == 1 - assert wait_ops[0].name == "long_wait" -``` - -[↑ Back to top](#table-of-contents) - -## Polling patterns - -### Wait-for-condition - -Poll until a condition is met: - -```python -@durable_execution -def handler(event: dict, context: DurableContext) -> int: - state = 0 - attempt = 0 - max_attempts = 5 - - while attempt < max_attempts: - attempt += 1 - - state = context.step(lambda _, s=state: s + 1, name=f"increment_{attempt}") - - if state >= 3: - break - - context.wait(Duration.from_seconds(1), name=f"wait_{attempt}") - - return state -``` - -Verify polling behavior: - -```python -@pytest.mark.durable_execution(handler=handler, lambda_function_name="polling") -def test_polling(durable_runner): - """Test wait-for-condition pattern.""" - with durable_runner: - result = durable_runner.run(input={}, timeout=30) - - assert result.status is InvocationStatus.SUCCEEDED - assert deserialize_operation_payload(result.result) == 3 - - # Should have 3 increment steps - step_ops = [op for op in result.operations if op.operation_type == OperationType.STEP] - assert len(step_ops) == 3 - - # Should have 2 waits (before reaching state 3) - wait_ops = [op for op in result.operations if op.operation_type.value == "WAIT"] - assert len(wait_ops) == 2 -``` - -[↑ Back to top](#table-of-contents) - -### Maximum attempts - -Test polling respects attempt limits: - -```python -@durable_execution -def handler(event: dict, context: DurableContext) -> dict: - target = event.get("target", 10) - state = 0 - attempt = 0 - max_attempts = 5 - - while attempt < max_attempts and state < target: - attempt += 1 - state = context.step(lambda _, s=state: s + 1, name=f"attempt_{attempt}") - - if state < target: - context.wait(Duration.from_seconds(1), name=f"wait_{attempt}") - - return {"state": state, "attempts": attempt, "reached_target": state >= target} -``` - -Test with unreachable target: - -```python -@pytest.mark.durable_execution(handler=handler, lambda_function_name="max_attempts") -def test_max_attempts(durable_runner): - """Test polling stops at max attempts.""" - with durable_runner: - result = durable_runner.run(input={"target": 10}, timeout=30) - - assert result.status is InvocationStatus.SUCCEEDED - - final_result = deserialize_operation_payload(result.result) - assert final_result["attempts"] == 5 - assert final_result["state"] == 5 - assert final_result["reached_target"] is False -``` - -[↑ Back to top](#table-of-contents) - -## FAQ - -**Q: How do I test workflows with long waits?** - -A: The test runner doesn't actually wait. You can verify wait operations are configured correctly without waiting for them to complete. - -**Q: Can I test workflows with external API calls?** - -A: Yes, but mock external dependencies in your tests. The test runner executes your code locally, so standard Python mocking works. - -**Q: What's the best way to test conditional logic?** - -A: Write separate tests for each execution path. Use descriptive test names and verify the specific operations that should execute in each path. - -**Q: How do I verify operation ordering?** - -A: Iterate through `result.operations` and check the order. You can also use operation names to verify specific sequences. - -**Q: What timeout should I use?** - -A: Use a timeout slightly longer than expected execution time. For most tests, 30-60 seconds is sufficient. - -**Q: How do I test error recovery?** - -A: Test both the failure case (verify the error is raised) and the recovery case (verify retry succeeds). Use separate tests for each scenario. - -[↑ Back to top](#table-of-contents) - -## See also - -- [Basic test patterns](basic-tests.md) - Simple testing patterns -- [Best practices](../best-practices.md) - Testing recommendations -- [Steps](../core/steps.md) - Step operations -- [Wait operations](../core/wait.md) - Wait operations -- [Callbacks](../core/callbacks.md) - Callback operations -- [Child contexts](../core/child-contexts.md) - Child context operations -- [Parallel operations](../core/parallel.md) - Parallel execution - -[↑ Back to top](#table-of-contents) - -## License - -See the [LICENSE](../../LICENSE) file for our project's licensing. - -[↑ Back to top](#table-of-contents) diff --git a/docs/testing-patterns/stores.md b/docs/testing-patterns/stores.md deleted file mode 100644 index 4b8d2844..00000000 --- a/docs/testing-patterns/stores.md +++ /dev/null @@ -1,262 +0,0 @@ -# Execution Stores - -## Table of Contents - -- [Overview](#overview) -- [Available stores](#available-stores) -- [In-memory store](#in-memory-store) -- [Filesystem store](#filesystem-store) -- [Choosing a store](#choosing-a-store) -- [Configuration](#configuration) -- [FAQ](#faq) -- [See also](#see-also) - -[← Back to main index](../index.md) - -## Overview - -Execution stores manage how test execution data is persisted during testing. The testing SDK (`aws-durable-execution-sdk-python-testing`) provides different store implementations for different testing scenarios. By default, tests use an in-memory store that's fast and doesn't require cleanup. For scenarios where you need persistence across test runs or want to inspect execution history, you can use a filesystem store. - -More store types will be added in future releases to support additional testing scenarios. - -[↑ Back to top](#table-of-contents) - -## Available stores - -The SDK currently provides two store implementations: - -- **In-memory store** - Fast, ephemeral storage for standard testing (default) -- **Filesystem store** - Persistent storage that saves executions to disk - -Additional store types may be added in future releases. - -[↑ Back to top](#table-of-contents) - -## In-memory store - -The in-memory store keeps execution data in memory during test runs. It's the default store and works well for most testing scenarios. - -### Characteristics - -- **Fast** - No disk I/O overhead -- **Ephemeral** - Data is lost when tests complete -- **Thread-safe** - Uses locks for concurrent access -- **No cleanup needed** - Memory is automatically freed - -### When to use - -Use the in-memory store when: -- Running standard unit tests -- You don't need to inspect executions after tests complete -- You want the fastest test execution -- You're running tests in CI/CD pipelines - -### Example - -The in-memory store is used by default: - -```python -import pytest -from aws_durable_execution_sdk_python.execution import InvocationStatus - -@pytest.mark.durable_execution( - handler=my_handler, - lambda_function_name="my_function", -) -def test_with_memory_store(durable_runner): - """Test uses in-memory store by default.""" - with durable_runner: - result = durable_runner.run(input={"data": "test"}, timeout=10) - - assert result.status is InvocationStatus.SUCCEEDED -``` - -[↑ Back to top](#table-of-contents) - -## Filesystem store - -The filesystem store persists execution data to disk as JSON files. Each execution is saved in a separate file, making it easy to inspect execution history. - -### Characteristics - -- **Persistent** - Data survives test runs -- **Inspectable** - JSON files can be viewed and analyzed -- **Configurable location** - Choose where files are stored -- **Automatic directory creation** - Creates storage directory if needed - -### When to use - -Use the filesystem store when: -- Debugging complex test failures -- You need to inspect execution history -- Running integration tests that span multiple sessions -- Analyzing execution patterns over time - -### Example - -Configure the filesystem store using environment variables: - -```console -# Set store type to filesystem -export AWS_DEX_STORE_TYPE=filesystem - -# Optionally set custom storage directory (defaults to .durable_executions) -export AWS_DEX_STORE_PATH=./test-executions - -# Run tests -pytest tests/ -``` - -Or configure it programmatically when using the cloud test runner: - -```python -from aws_durable_execution_sdk_python_testing.runner import ( - DurableFunctionCloudTestRunner, - DurableFunctionCloudTestRunnerConfig, -) -from aws_durable_execution_sdk_python_testing.stores.base import StoreType - -config = DurableFunctionCloudTestRunnerConfig( - function_name="my-function", - region="us-west-2", - store_type=StoreType.FILESYSTEM, - store_path="./my-test-executions", -) - -runner = DurableFunctionCloudTestRunner(config=config) -``` - -### Storage format - -Executions are stored as JSON files with sanitized ARN names: - -``` -.durable_executions/ -β”œβ”€β”€ arn_aws_states_us-west-2_123456789012_execution_my-function_abc123.json -β”œβ”€β”€ arn_aws_states_us-west-2_123456789012_execution_my-function_def456.json -└── arn_aws_states_us-west-2_123456789012_execution_my-function_ghi789.json -``` - -Each file contains the complete execution state including operations, checkpoints, and results. - -[↑ Back to top](#table-of-contents) - -## Choosing a store - -Use this guide to choose the right store for your needs: - -| Scenario | Recommended Store | Reason | -|----------|------------------|---------| -| Unit tests | In-memory | Fast, no cleanup needed | -| CI/CD pipelines | In-memory | Fast, ephemeral | -| Debugging failures | Filesystem | Inspect execution history | -| Integration tests | Filesystem | Persist across sessions | -| Performance testing | In-memory | Minimize I/O overhead | -| Execution analysis | Filesystem | Analyze patterns over time | - -[↑ Back to top](#table-of-contents) - -## Configuration - -### Environment variables - -Configure stores using environment variables: - -```console -# Store type (memory or filesystem) -export AWS_DEX_STORE_TYPE=filesystem - -# Storage directory for filesystem store (optional, defaults to .durable_executions) -export AWS_DEX_STORE_PATH=./test-executions -``` - -### Programmatic configuration - -Configure stores when creating a cloud test runner: - -```python -from aws_durable_execution_sdk_python_testing.runner import ( - DurableFunctionCloudTestRunner, - DurableFunctionCloudTestRunnerConfig, -) -from aws_durable_execution_sdk_python_testing.stores.base import StoreType - -# In-memory store (default) -config = DurableFunctionCloudTestRunnerConfig( - function_name="my-function", - region="us-west-2", - store_type=StoreType.MEMORY, -) - -# Filesystem store -config = DurableFunctionCloudTestRunnerConfig( - function_name="my-function", - region="us-west-2", - store_type=StoreType.FILESYSTEM, - store_path="./my-executions", -) - -runner = DurableFunctionCloudTestRunner(config=config) -``` - -### Default values - -If not specified: -- Store type defaults to `MEMORY` -- Filesystem store path defaults to `.durable_executions` - -[↑ Back to top](#table-of-contents) - -## FAQ - -**Q: Can I switch stores between test runs?** - -A: Yes, you can change the store type at any time. However, executions stored in one store won't be available in another. - -**Q: Does the filesystem store clean up old executions?** - -A: No, the filesystem store doesn't automatically delete old executions. You need to manually clean up the storage directory when needed. - -**Q: Can I use the filesystem store with the local test runner?** - -A: The filesystem store is primarily designed for the cloud test runner. The local test runner uses an in-memory store by default. - -**Q: Are execution files human-readable?** - -A: Yes, execution files are stored as formatted JSON and can be opened in any text editor. - -**Q: What happens if the storage directory doesn't exist?** - -A: The filesystem store automatically creates the directory if it doesn't exist. - -**Q: Can I use a custom store implementation?** - -A: The SDK defines an `ExecutionStore` protocol that you can implement for custom storage backends. However, this is an advanced use case. - -**Q: Will more store types be added?** - -A: Yes, additional store types may be added in future releases to support more testing scenarios. - -**Q: Does the in-memory store support concurrent tests?** - -A: Yes, the in-memory store is thread-safe and supports concurrent test execution. - -**Q: How much disk space does the filesystem store use?** - -A: Each execution typically uses a few KB to a few MB depending on the number of operations and data size. Monitor your storage directory if running many tests. - -[↑ Back to top](#table-of-contents) - -## See also - -- [Basic tests](basic-tests.md) - Simple test patterns -- [Testing modes](../advanced/testing-modes.md) - Local and cloud test execution -- [Best practices](../best-practices.md) - Testing recommendations - -[↑ Back to top](#table-of-contents) - -## License - -See the [LICENSE](../../LICENSE) file for our project's licensing. - -[↑ Back to top](#table-of-contents)