--- AWSTemplateFormatVersion: "2010-09-09" Description: |- Cross-account, cross-region backups with AWS Backup and EventBridge github.com/sqlxpert/backup-events-aws GPLv3 Copyright Paul Marcelin Parameters: PlaceholderSuggestedStackName: Type: String Default: "BackupEvents" PlaceholderSuggestedStackSetDescription: Type: String Default: "Cross-account, cross-region backups with AWS Backup and EventBridge" PlaceholderHelp: Type: String Default: "github.com/sqlxpert/backup-events-aws" EnableCopy: Type: String Description: >- Original backups completed while this is false will never be copied to the backup account. Copies completed while this is false will never be copied to the backup region. No catch-up provision! Default: "true" AllowedValues: - "false" - "true" EnableUpdateLifecycle: Type: String Description: >- Original backups completed while this is false will never have their lifecycles updated for early deletion. 3 copies (the usual 2 copies in the backup account, plus the original backup, in its original account) will be retained according to the backup's initial lifecycle. No catch-up provision! Default: "true" AllowedValues: - "false" - "true" OrgId: Type: String Description: >- Example: "o-abcde12345" AllowedPattern: '.+' ConstraintDescription: Must not be blank BackupAccountId: Type: String Description: >- An account different from the one(s) containing resources to back up. Backups will first be copied to this account, with no change in region. AllowedPattern: '.+' ConstraintDescription: Must not be blank BackupRegion: Type: String Description: >- Example: "us-east-1". Within the backup account, backup copies that originated in other regions will be copied to this second region. AllowedPattern: '.+' ConstraintDescription: Must not be blank BackupRegionAlternate: Type: String Description: >- Within the backup account, backup copies that originated in the backup region will be copied to this alternate second region. AllowedPattern: '.+' ConstraintDescription: Must not be blank NewDeleteAfterDays: Type: Number Description: >- After a backup has been copied to the backup account, the original backup can be deleted; its lifecycle will be updated. Count days from when it was created. For incremental backups -- such as EBS volume snapshots (including those making up EC2 images AMIs), RDS (but not Aurora) database snapshots, and EFS file system backups -- do not reduce this below the number of days between scheduled AWS Backup backups. See https://docs.aws.amazon.com/aws-backup/latest/devguide/metering-and-billing.html MinValue: 1 Default: 7 PlaceholderAdvancedParameters: Type: String Default: "" AllowedValues: - "" CreateSampleVault: Type: String Description: >- (1) Within the backup account, vaults are required in the backup and alternate backup regions. For every region containing resources to back up, a vault is required in the backup account (and its vault access policy must allow the backup role in every resource account to "backup:CopyIntoBackupVault" ), as well as in every resource account (to receive original backups). For a scalable solution, all vaults must be encrypted with the same customer-managed multi-region KMS key, and the key policies of the primary and regional replica keys must allow cross-account key usage, usage for AWS Backup, and usage for underlying AWS services (depending on the types of resources that you back up). (2) By default, the sample vaults are encrypted with the AWS-managed "aws/backup" KMS key in each account and region, so they can only be used to demonstrate backing up unencrypted EFS file systems. (3) Combining KMS encryption, AWS Backup, multiple resource types, multiple regions, and multiple accounts is an advanced topic. See https://docs.aws.amazon.com/aws-backup/latest/devguide/encryption.html Default: "true" AllowedValues: - "false" - "true" VaultCustomKmsKey: Type: String Description: >- Generally, leave blank. If you are using the sample vaults but bringing your own key (BYOK), specify "ACCOUNT:key/KEY_ID", where KEY_ID begins with "mrk-". This must be a suitably-configured customer-managed multi-region KMS key. See CreateSampleVault (above) for the requirements. Default: "" VaultName: Type: String Description: >- (Not ARN!) All required vaults must have the same name. Change this if you are bringing your own vaults (BYOV), or if you are using the sample vaults but you need to create multiple stacks from the same template, in the same accounts and regions, for example, during a blue/green deployment. Default: "BackupEvents-Sample" CopyRoleName: Type: String Description: >- (Not ARN!) The AWS Backup service will assume this role when copying backups. A role of the same name is required in each account that contains resources to back up, and also in the backup account. (Roles are acount-wide, not regional.) For possible one-time setup steps before using "service-role/AWSBackupDefaultServiceRole", see https://docs.aws.amazon.com/aws-backup/latest/devguide/iam-service-roles.html#default-service-roles Default: "service-role/AWSBackupDefaultServiceRole" OnlyResourceAccountId: Type: String Description: >- The only account that contains resources to back up. Leave blank if your resources are spread across multiple accounts, in which case less specific (organization-wide) permissions will be used. LogsRetainDays: Type: Number Description: >- See retentionInDays in http://docs.aws.amazon.com/AmazonCloudWatchLogs/latest/APIReference/API_PutRetentionPolicy.html Default: 7 LogLevel: Type: String Description: >- See https://docs.python.org/3/library/logging.html#levels Default: ERROR AllowedValues: - CRITICAL - ERROR - WARNING - INFO - DEBUG - NOTSET CloudWatchLogsKmsKey: Type: String Description: >- If this is blank, function logs receive CloudWatch default non-KMS encryption. To use a customer-managed, multi-region KMS encryption key instead, specify "ACCOUNT:key/KEY_ID", where KEY_ID begins with "mrk-". The primary, or a replica key, must exist in regions containing resources to back up. Key policy updates are necessary. See https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/encrypt-log-data-kms.html#cmk-permissions Default: "" SqsKmsKey: Type: String Description: >- If this is blank, EventBridge events sent to the "dead letter" queue receive SQS default non-KMS encryption. To use the AWS-managed default KMS key, specify "alias/aws/sqs". To use a customer-managed, multi-region KMS key, specify "ACCOUNT:key/KEY_ID", where where KEY_ID begins with "mrk-". The primary, or a replica key, must exist in regions containing resources to back up. Key policy updates are necessary. See https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-key-management.html#compatibility-with-aws-services Default: "" CopyLambdaFnMemoryMB: Type: Number Description: >- Increase this only in case of out-of-memory errors. See https://docs.aws.amazon.com/lambda/latest/operatorguide/computing-power.html Default: 128 CopyLambdaFnTimeoutSecs: Type: Number Description: >- Increase this only in case of time-out errors. See https://aws.amazon.com/about-aws/whats-new/2018/10/aws-lambda-supports-functions-that-can-run-up-to-15-minutes/ Default: 30 UniqueNamePrefix: Type: String Description: >- Change this only if you need to create multiple stacks from the same template, in the same accounts and regions, for example, during a blue/green deployment. Default: "BackupEvents" UpdateLifecycleLambdaFnMemoryMB: Type: Number Description: >- Increase this only in case of out-of-memory errors. Default: 128 UpdateLifecycleLambdaFnTimeoutSecs: Type: Number Description: >- Increase this only in case of time-out errors. Default: 60 Metadata: AWS::CloudFormation::Interface: ParameterGroups: - Label: { default: For Reference } Parameters: - PlaceholderSuggestedStackName - PlaceholderSuggestedStackSetDescription - PlaceholderHelp - Label: { default: Essential } Parameters: - EnableCopy - EnableUpdateLifecycle - OrgId - BackupAccountId - BackupRegion - BackupRegionAlternate - NewDeleteAfterDays - Label: { default: Advanced... } Parameters: - PlaceholderAdvancedParameters - Label: { default: Vaults } Parameters: - CreateSampleVault - VaultCustomKmsKey - VaultName - CopyRoleName - OnlyResourceAccountId - Label: { default: Both AWS Lambda functions... } Parameters: - LogsRetainDays - LogLevel - CloudWatchLogsKmsKey - SqsKmsKey - Label: { default: Function to start copying a backup } Parameters: - CopyLambdaFnMemoryMB - CopyLambdaFnTimeoutSecs - UniqueNamePrefix - Label: { default: Function to schedule a backup for deletion } Parameters: - UpdateLifecycleLambdaFnMemoryMB - UpdateLifecycleLambdaFnTimeoutSecs ParameterLabels: PlaceholderSuggestedStackName: default: Suggested stack or StackSet name PlaceholderSuggestedStackSetDescription: default: Suggested StackSet description PlaceholderHelp: default: For help, see EnableCopy: default: Enable copying of backups EnableUpdateLifecycle: default: Enable backup retention reduction OrgId: default: AWS Organization ID BackupAccountId: default: Backup AWS account BackupRegion: default: Backup region BackupRegionAlternate: default: Alternate for backup region NewDeleteAfterDays: default: Days (from creation) to keep original backups PlaceholderAdvancedParameters: default: Do not change the parameters below, unless necessary! CreateSampleVault: default: Create sample vaults? VaultCustomKmsKey: default: Custom KMS key for sample vaults VaultName: default: Vault name CopyRoleName: default: IAM role name for copying backups OnlyResourceAccountId: default: Only AWS account with resources to back up LogsRetainDays: default: Days before deleting logs LogLevel: default: Level of detail in logs CloudWatchLogsKmsKey: default: KMS encryption key for logs SqsKmsKey: default: KMS encryption key for event errors CopyLambdaFnMemoryMB: default: Megabytes of memory CopyLambdaFnTimeoutSecs: default: Seconds before timeout UniqueNamePrefix: default: Unique resource name prefix UpdateLifecycleLambdaFnMemoryMB: default: Megabytes of memory UpdateLifecycleLambdaFnTimeoutSecs: default: Seconds before timeout Rules: BackupRegionNotEqualsAlternate: Assertions: - Assert: !Not [ !Equals [ !Ref BackupRegion, !Ref BackupRegionAlternate ] ] AssertDescription: >- BackupRegion and BackupRegionAlternate must be different. EnableUpdateLifecycleRequiresEnableCopy: Assertions: - Assert: Fn::Not: - Fn::And: - !Equals [ !Ref EnableUpdateLifecycle, "true" ] - !Not [ !Equals [ !Ref EnableCopy, "true" ] ] AssertDescription: >- EnableUpdateLifecycle requires EnableCopy. Conditions: EnableCopyTrue: !Equals [ !Ref EnableCopy, "true" ] EnableUpdateLifecycleTrue: !Equals [ !Ref EnableUpdateLifecycle, "true" ] InBackupRegion: !Equals [ !Ref AWS::Region, !Ref BackupRegion ] InBackupAccount: !Equals [ !Ref AWS::AccountId, !Ref BackupAccountId ] NotInBackupAccount: !Not [ !Condition InBackupAccount ] CreateSampleVaultTrue: !Equals [ !Ref CreateSampleVault, "true" ] VaultCustomKmsKeyNo: !Equals [ !Ref VaultCustomKmsKey, "" ] OnlyResourceAccountIdBlank: !Equals [ !Ref OnlyResourceAccountId, "" ] CloudWatchLogsKmsKeyNo: !Equals [ !Ref CloudWatchLogsKmsKey, "" ] SqsKmsKeyNo: !Equals [ !Ref SqsKmsKey, "" ] SqsKmsKeyCustom: Fn::And: - !Not [ !Condition SqsKmsKeyNo ] - !Not [ !Equals [ !Ref SqsKmsKey, "alias/aws/sqs" ] ] Resources: # Administrator: Restrict iam:PassRole to prevent arbitrary use of these # roles InvokeCopyLambdaFnInBackupAcctRole: Type: AWS::IAM::Role Condition: NotInBackupAccount Properties: # Cross-account: RoleName: !Sub "${UniqueNamePrefix}-InvokeCopyLambdaFnInBackupAcctRole-${AWS::Region}" AssumeRolePolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Principal: { Service: events.amazonaws.com } Action: sts:AssumeRole Condition: StringEquals: "aws:ResourceOrgID": !Ref OrgId ArnLike: "aws:SourceArn": !Sub "arn:aws:events:${AWS::Region}:*:rule/${UniqueNamePrefix}-Copy1ToBackupAcctCompletedCopyLambdaFnEvRule" # Confused deputy prevention; fixed rule name avoids # circular dependency Policies: - PolicyName: LambdaInvoke PolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Action: lambda:InvokeFunction Resource: !Sub "arn:${AWS::Partition}:lambda:${AWS::Region}:${BackupAccountId}:function:${UniqueNamePrefix}-CopyLambdaFn" Condition: StringEquals: "aws:ResourceOrgID": !Ref OrgId EventTargetErrorQueuePol: Type: AWS::SQS::QueuePolicy Condition: NotInBackupAccount Properties: Queues: [ !Ref EventTargetErrorQueue ] PolicyDocument: Version: "2012-10-17" Statement: - Sid: RequireTls Effect: Deny Principal: "*" Action: sqs:* Resource: "*" Condition: Bool: { "aws:SecureTransport": "false" } - Effect: Allow Principal: "*" Action: sqs:GetQueueAttributes Resource: "*" Condition: StringEquals: { "aws:PrincipalOrgId": !Ref OrgId } - Sid: DeadLetterSource Effect: Allow Principal: "*" Action: sqs:SendMessage Resource: "*" Condition: ArnEquals: "aws:SourceArn": - !GetAtt BackupCompletedCopyLambdaFnEvRule.Arn - !GetAtt Copy1CompletedUpdateLifecycleLambdaFnEvRule.Arn - !GetAtt Copy1ToBackupAcctCompletedCopyLambdaFnEvRule.Arn - Sid: ExclusiveSource Effect: Deny Principal: "*" Action: sqs:SendMessage Resource: "*" Condition: ArnNotEquals: "aws:SourceArn": - !GetAtt BackupCompletedCopyLambdaFnEvRule.Arn - !GetAtt Copy1CompletedUpdateLifecycleLambdaFnEvRule.Arn - !GetAtt Copy1ToBackupAcctCompletedCopyLambdaFnEvRule.Arn CopyLambdaFnRole: Type: AWS::IAM::Role Properties: Description: !Sub "For ${AWS::Region} region" # Conspicuous in the AWS Console. Prefer CloudFormation naming and one # role per region, for a role that is not referenced from other AWS # accounts. One all-region role would: be a little less secure, # require an extra parameter (region in which to house roles), and # create a cross-region deployment order dependency. AssumeRolePolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Principal: { Service: lambda.amazonaws.com } Action: sts:AssumeRole Policies: - PolicyName: CloudWatchLogsCreateLogGroupIfDeleted PolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Action: - logs:CreateLogGroup Resource: !GetAtt CopyLambdaFnLogGrp.Arn - PolicyName: CloudWatchLogsWrite PolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Action: - logs:CreateLogStream - logs:PutLogEvents Resource: !Sub "arn:${AWS::Partition}:logs:${AWS::Region}:${AWS::AccountId}:log-group:${CopyLambdaFnLogGrp}:log-stream:*" # !GetAtt LogGroup.Arn ends with :* instead of allowing us to # append :log-stream:* to make a log stream ARN - PolicyName: BackupCopy PolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Action: backup:StartCopyJob Resource: "*" Condition: StringEquals: "aws:ResourceAccount": !Ref AWS::AccountId # This ia a global ("aws:"), not AWS Backup ("backup:") # condition key. "Resource" refers to the original backup # or the 1st copy, not to the "protected resource" (that # is, not to the resource that was backed up). - Effect: Allow Action: iam:PassRole Resource: !Sub "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/${CopyRoleName}" Condition: StringLike: { "iam:PassedToService": backup.amazonaws.com } UpdateLifecycleLambdaFnRole: Type: AWS::IAM::Role Condition: NotInBackupAccount Properties: Description: !Sub "For ${AWS::Region} region" # Conspicuous Console AssumeRolePolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Principal: { Service: lambda.amazonaws.com } Action: sts:AssumeRole Policies: - PolicyName: CloudWatchLogsCreateLogGroupIfDeleted PolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Action: - logs:CreateLogGroup Resource: !GetAtt UpdateLifecycleLambdaFnLogGrp.Arn - PolicyName: CloudWatchLogsWrite PolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Action: - logs:CreateLogStream - logs:PutLogEvents Resource: !Sub "arn:${AWS::Partition}:logs:${AWS::Region}:${AWS::AccountId}:log-group:${UpdateLifecycleLambdaFnLogGrp}:log-stream:*" # !GetAtt LogGroup.Arn ends with :* instead of allowing us to # append :log-stream:* to make a log stream ARN - PolicyName: BackupRead PolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Action: backup:DescribeRecoveryPoint Resource: "*" Condition: StringEquals: "aws:ResourceAccount": !Ref AWS::AccountId # This is a global ("aws:"), not AWS Backup ("backup:") # condition key. "Resource" refers to the original backup, # not to the "protected resource" (that is, not to the # resource that was backed up). - PolicyName: BackupWrite PolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Action: backup:UpdateRecoveryPointLifecycle Resource: "*" Condition: StringEquals: "aws:ResourceAccount": !Ref AWS::AccountId # This is a global ("aws:"), not AWS Backup ("backup:") # condition key. "Resource" refers to the original backup, # not to the "protected resource" (that is, not to the # resource that was backed up). SampleVault: Type: AWS::Backup::BackupVault Condition: CreateSampleVaultTrue Properties: BackupVaultName: !Ref VaultName EncryptionKeyArn: Fn::If: - VaultCustomKmsKeyNo - !Ref AWS::NoValue - !Sub "arn:${AWS::Partition}:kms:${AWS::Region}:${VaultCustomKmsKey}" BackupVaultTags: Note: Fn::If: - VaultCustomKmsKeyNo - Fn::If: - InBackupAccount - Fn::If: - InBackupRegion - >- Only for copies of copies of EFS backups. Prior copy: same account and other region and encrypted with the AWS-managed default aws/backup key for that region. This copy: encrypted with the aws/backup key for this region. - >- Only for copies of EFS backups. Original backup: other account and same region and unencrypted. This copy: encrypted with the AWS-managed default aws/backup key for this account and this region. - >- Only for backups of unencrypted EFS file systems. File system: same account and region and unencrypted. This backup: encrypted with the aws/backup key for this region. - !Ref AWS::NoValue AccessPolicy: Version: "2012-10-17" Statement: - Fn::If: - InBackupAccount - Sid: CopyIntoBackupVaultAllowServiceRole Effect: Allow Principal: "*" Condition: StringEquals: "aws:PrincipalOrgId": !Ref OrgId ArnLike: "aws:PrincipalArn": !Sub "arn:aws:iam::*:role/${CopyRoleName}" Action: backup:CopyIntoBackupVault Resource: "*" - !Ref AWS::NoValue - Sid: CopyFromBackupVaultExclusiveCopyTarget Effect: Deny Principal: "*" Action: backup:CopyFromBackupVault Resource: "*" Condition: "ForAllValues:ArnNotEquals": "backup:CopyTargets": # See also VaultARN and VAULT_ARN - Fn::If: - InBackupAccount - Fn::If: - InBackupRegion - !Sub "arn:${AWS::Partition}:backup:${BackupRegionAlternate}:${AWS::AccountId}:backup-vault:${VaultName}" - !Sub "arn:${AWS::Partition}:backup:${BackupRegion}:${AWS::AccountId}:backup-vault:${VaultName}" - !Sub "arn:${AWS::Partition}:backup:${AWS::Region}:${BackupAccountId}:backup-vault:${VaultName}" EventTargetErrorQueue: Type: AWS::SQS::Queue Condition: NotInBackupAccount Properties: DelaySeconds: 0 MessageRetentionPeriod: 604800 # seconds (7 days) ReceiveMessageWaitTimeSeconds: 20 # long polling (lowest cost) VisibilityTimeout: 60 # seconds SqsManagedSseEnabled: !If [ SqsKmsKeyNo, true, false ] KmsMasterKeyId: Fn::If: - SqsKmsKeyNo - !Ref AWS::NoValue - Fn::If: - SqsKmsKeyCustom - !Sub "arn:${AWS::Partition}:kms:${AWS::Region}:${SqsKmsKey}" - !Ref SqsKmsKey KmsDataKeyReusePeriodSeconds: Fn::If: - SqsKmsKeyNo - !Ref AWS::NoValue - 86400 # seconds (24 hours) CopyLambdaFnLogGrp: Type: AWS::Logs::LogGroup Properties: RetentionInDays: !Ref LogsRetainDays KmsKeyId: Fn::If: - CloudWatchLogsKmsKeyNo - !Ref AWS::NoValue - !Sub "arn:${AWS::Partition}:kms:${AWS::Region}:${CloudWatchLogsKmsKey}" UpdateLifecycleLambdaFnLogGrp: Type: AWS::Logs::LogGroup Condition: NotInBackupAccount Properties: RetentionInDays: !Ref LogsRetainDays KmsKeyId: Fn::If: - CloudWatchLogsKmsKeyNo - !Ref AWS::NoValue - !Sub "arn:${AWS::Partition}:kms:${AWS::Region}:${CloudWatchLogsKmsKey}" BackupCompletedCopyLambdaFnEvRule: Type: AWS::Events::Rule Condition: NotInBackupAccount Properties: Description: >- After backup has been created, store 1st (same-region) copy in backup account EventPattern: source: [ aws.backup ] detail-type: [ Backup Job State Change ] detail: state: [ COMPLETED ] backupVaultArn: - !Sub "arn:${AWS::Partition}:backup:${AWS::Region}:${AWS::AccountId}:backup-vault:${VaultName}" version: [ "0" ] Targets: - Id: !Ref CopyLambdaFn Arn: !GetAtt CopyLambdaFn.Arn DeadLetterConfig: { Arn: !GetAtt EventTargetErrorQueue.Arn } State: !If [ EnableCopyTrue, ENABLED, DISABLED ] Copy1ToBackupAcctCompletedCopyLambdaFnEvRule: Type: AWS::Events::Rule Condition: NotInBackupAccount Properties: # Cross-account; fixed name also avoids circular dependency for confused # deputy prevention Name: !Sub "${UniqueNamePrefix}-Copy1ToBackupAcctCompletedCopyLambdaFnEvRule" Description: >- After 1st (same-region) copy of backup has been stored in backup account, store another copy in a 2nd region EventPattern: source: [ aws.backup ] detail-type: [ Copy Job State Change ] detail: state: [ COMPLETED ] sourceBackupVaultArn: - !Sub "arn:${AWS::Partition}:backup:${AWS::Region}:${AWS::AccountId}:backup-vault:${VaultName}" destinationBackupVaultArn: - !Sub "arn:${AWS::Partition}:backup:${AWS::Region}:${BackupAccountId}:backup-vault:${VaultName}" version: [ "0" ] Targets: - Id: !Sub "${UniqueNamePrefix}-CopyLambdaFn" RoleArn: !GetAtt InvokeCopyLambdaFnInBackupAcctRole.Arn Arn: !Sub "arn:${AWS::Partition}:lambda:${AWS::Region}:${BackupAccountId}:function:${UniqueNamePrefix}-CopyLambdaFn" DeadLetterConfig: { Arn: !GetAtt EventTargetErrorQueue.Arn } State: !If [ EnableCopyTrue, ENABLED, DISABLED ] # Administrator: Block other invocations BackupCompletedCopyLambdaFnEvRulePerm: Type: AWS::Lambda::Permission Condition: NotInBackupAccount Properties: SourceArn: !GetAtt BackupCompletedCopyLambdaFnEvRule.Arn Principal: events.amazonaws.com Action: lambda:InvokeFunction FunctionName: !Ref CopyLambdaFn Copy1ToBackupAcctCompletedCopyLambdaFnEvRulePerm: Type: AWS::Lambda::Permission Condition: InBackupAccount Properties: # AWS Lambda does not allow direct editing of a Lambda function's # resource access policy. The available AddPermission API does not # accept the * wildcard in the AccountId position of ARNs. Therefore, # unless we're expecting backups from only one account, we must allow # any Principal in the organization, against any SouceArn . # Administrator: Consider restricting by service control policy (SCP). SourceArn: Fn::If: - OnlyResourceAccountIdBlank - !Ref AWS::NoValue - !Sub "arn:aws:events:${AWS::Region}:${OnlyResourceAccountId}:rule/${UniqueNamePrefix}-Copy1ToBackupAcctCompletedCopyLambdaFnEvRule" PrincipalOrgID: !Ref OrgId Principal: Fn::If: - OnlyResourceAccountIdBlank - "*" - !Sub "arn:aws:iam::${OnlyResourceAccountId}:role/${UniqueNamePrefix}-InvokeCopyLambdaFnInBackupAcctRole-${AWS::Region}" Action: lambda:InvokeFunction FunctionName: !Ref CopyLambdaFn Copy1CompletedUpdateLifecycleLambdaFnEvRule: Type: AWS::Events::Rule Condition: NotInBackupAccount Properties: Description: >- After 1st (same-region) copy has been stored in backup account, schedule deletion of original backup EventPattern: source: [ aws.backup ] detail-type: [ Copy Job State Change ] detail: state: [ COMPLETED ] sourceBackupVaultArn: - !Sub "arn:${AWS::Partition}:backup:${AWS::Region}:${AWS::AccountId}:backup-vault:${VaultName}" destinationBackupVaultArn: - !Sub "arn:${AWS::Partition}:backup:${AWS::Region}:${BackupAccountId}:backup-vault:${VaultName}" version: [ "0" ] Targets: - Id: !Ref UpdateLifecycleLambdaFn Arn: !GetAtt UpdateLifecycleLambdaFn.Arn DeadLetterConfig: { Arn: !GetAtt EventTargetErrorQueue.Arn } State: !If [ EnableUpdateLifecycleTrue, ENABLED, DISABLED ] # Administrator: Block other invocations UpdateLifecycleLambdaFnPerm: Type: AWS::Lambda::Permission Condition: NotInBackupAccount Properties: SourceArn: !GetAtt Copy1CompletedUpdateLifecycleLambdaFnEvRule.Arn Principal: events.amazonaws.com Action: lambda:InvokeFunction FunctionName: !Ref UpdateLifecycleLambdaFn CopyLambdaFn: Type: AWS::Lambda::Function Properties: FunctionName: Fn::If: - InBackupAccount - !Sub "${UniqueNamePrefix}-CopyLambdaFn" # Cross-account - !Ref AWS::NoValue # Same-account, prefer CloudFormation naming Role: !GetAtt CopyLambdaFnRole.Arn Timeout: !Ref CopyLambdaFnTimeoutSecs MemorySize: !Ref CopyLambdaFnMemoryMB LoggingConfig: LogGroup: !Ref CopyLambdaFnLogGrp LogFormat: JSON SystemLogLevel: WARN ApplicationLogLevel: !Ref LogLevel Architectures: - arm64 Runtime: python3.13 # To avoid making users build a source bundle and distribute it to a # bucket in every target region (an AWS Lambda requirement when using # S3), supply shared, multi-handler source code in-line... Environment: Variables: NEW_DELETE_AFTER_DAYS: "0" # Not needed by this handler COPY_ROLE_ARN: !Sub "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/${CopyRoleName}" BACKUP_VAULT_NAME: !Ref VaultName DESTINATION_BACKUP_VAULT_ARN: # See also VaultARN and backup:CopyTargets Fn::If: - InBackupAccount - Fn::If: - InBackupRegion - !Sub "arn:${AWS::Partition}:backup:${BackupRegionAlternate}:${AWS::AccountId}:backup-vault:${VaultName}" - !Sub "arn:${AWS::Partition}:backup:${BackupRegion}:${AWS::AccountId}:backup-vault:${VaultName}" - !Sub "arn:${AWS::Partition}:backup:${AWS::Region}:${BackupAccountId}:backup-vault:${VaultName}" Handler: index.lambda_handler_copy Code: ZipFile: | #!/usr/bin/env python3 """Cross-account, cross-region backups with AWS Backup and EventBridge github.com/sqlxpert/backup-events-aws GPLv3 Copyright Paul Marcelin """ import os import logging import time import json import datetime import botocore import boto3 logger = logging.getLogger() # Skip "credentials in environment" INFO message, unavoidable in AWS Lambda: logging.getLogger("botocore").setLevel(logging.WARNING) os.environ["TZ"] = "UTC" time.tzset() # See get_update_lifecycle_kwargs and lambda_handler_update_lifecycle NEW_DELETE_AFTER_DAYS = int(os.environ["NEW_DELETE_AFTER_DAYS"]) def get_backup_action_kwargs_base(): """Get base kwargs for AWS Backup methods, from environment variables """ backup_vault_name = os.environ["BACKUP_VAULT_NAME"] return { "DEFAULT": {"BackupVaultName": backup_vault_name}, "start_copy_job": { "IamRoleArn": os.environ["COPY_ROLE_ARN"], "SourceBackupVaultName": backup_vault_name, "DestinationBackupVaultArn": os.environ["DESTINATION_BACKUP_VAULT_ARN"], }, } def log(entry_type, entry_value, log_level=logging.INFO): """Emit a JSON-format log entry """ entry_value_out = json.loads(json.dumps(entry_value, default=str)) # Avoids "Object of type datetime is not JSON serializable" in # https://github.com/aws/aws-lambda-python-runtime-interface-client/blob/9efb462/awslambdaric/lambda_runtime_log_utils.py#L109-L135 # # The JSON encoder in the AWS Lambda Python runtime isn't configured to # serialize datatime values in responses returned by AWS's own Python SDK! # # Alternative considered: # https://docs.powertools.aws.dev/lambda/python/latest/core/logger/ logger.log( log_level, "", extra={"type": entry_type, "value": entry_value_out} ) def boto3_success(resp): """Take a boto3 response, return True if result was success Success means an AWS operation has started, not necessarily that it has completed. For example, it may take hours to copy a backup. """ return ( isinstance(resp, dict) and isinstance(resp.get("ResponseMetadata"), dict) and (resp["ResponseMetadata"].get("HTTPStatusCode") == 200) ) class Backup(): """AWS Backup recovery point If the object is not also a subclass, then the recovery point is the original, from start_backup_job : https://docs.aws.amazon.com/aws-backup/latest/devguide/eventbridge.html#backup-job-state-change-completed """ _boto3_client = None action_kwargs_base = get_backup_action_kwargs_base() _from_job_id_key = "backupJobId" def __init__(self, from_event): self._from_event = from_event @staticmethod def new(from_event): """Create a Backup or BackupCopy instance Takes a start_backup_job or start_copy_job state change COMPLETED event (EventBridge filters in CloudFormation allow only acceptable events) """ new_object_class = ( BackupCopy if from_event["detail-type"].startswith("Copy ") else Backup ) return new_object_class(from_event) @classmethod def get_boto3_client(cls): """Create (if needed) and return a boto3 client for the AWS Backup service boto3 method references can only be resolved at run-time, against an instance of an AWS service's Client class. http://boto3.readthedocs.io/en/latest/guide/events.html#extensibility-guide Alternatives considered: https://github.com/boto/boto3/issues/3197#issue-1175578228 https://github.com/aws-samples/boto-session-manager-project """ if cls._boto3_client is None: cls._boto3_client = boto3.client( "backup", config=botocore.config.Config(retries={"mode": "standard"}) ) return cls._boto3_client # pylint: disable=missing-function-docstring @property def _from_job_details(self): return self._from_event.get("detail", {}) @property def from_event(self): return self._from_event @property def from_job_id(self): return self._from_job_details.get(self._from_job_id_key, "") @property def from_backup_arn(self): # Reserve from_rsrc_arn for original backups return "" @property def arn(self): return self._from_event.get("resources", [""])[0] # pylint: enable=missing-function-docstring def valid(self): """Return True if all required attributes are non-empty A cursory validation, but EventBridge filters in CloudFormation allow only acceptable events, which come from AWS Backup. """ return all([self.from_job_id, self.arn]) # More attributes coming! def log_action(self, action_name, action_kwargs, exception=None, resp=None): """Log the AWS Lambda event and the outcome of an action on a backup """ log_level = logging.INFO if boto3_success(resp) else logging.ERROR log("LAMBDA_EVENT", self.from_event, log_level) log(f"{action_name.upper()}_KWARGS", action_kwargs, log_level) if exception is not None: log("EXCEPTION", exception, log_level) elif resp is not None: log("AWS_RESPONSE", resp, log_level) def do_action( self, action_name, kwargs_add={}, validate_backup=True ): # pylint: disable=dangerous-default-value """Take an AWS Backup method and kwargs, log outcome, and return response """ action_kwargs = self.action_kwargs_base.get( action_name, self.action_kwargs_base["DEFAULT"] ) | kwargs_add # Copy, don't update! resp = None if validate_backup and not self.valid(): self.log_action(action_name, action_kwargs) else: action_method = getattr(self.get_boto3_client(), action_name) try: resp = action_method(**action_kwargs) except Exception as misc_exception: self.log_action(action_name, action_kwargs, exception=misc_exception) raise self.log_action(action_name, action_kwargs, resp=resp) return resp class BackupCopy(Backup): """AWS Backup recovery point copy, from start_copy_job https://docs.aws.amazon.com/aws-backup/latest/devguide/eventbridge.html#copy-job-state-change-completed Why didn't AWS use the same structure and keys for start_backup_job and the destination half of start_copy_job ? Both methods put a backup into a destination vault! """ _from_job_id_key = "copyJobId" @property def from_backup_arn(self): # Reserve from_rsrc_arn for original backups return self._from_event.get("resources", [""])[0] @property def arn(self): # pylint: disable=missing-function-docstring return self._from_job_details.get("destinationRecoveryPointArn", "") def valid(self): """Return True if all required attributes are non-empty A cursory validation, but EventBridge filters in CloudFormation allow only acceptable events, which come from AWS Backup. """ return all([self.from_job_id, self.from_backup_arn, self.arn]) def get_update_lifecycle_kwargs(describe_resp, today_date): """Take a describe response, return update_recovery_point_lifecycle kwargs Sets/reduces DeleteAfterDays, so a backup that has been copied to another vault can be scheduled for deletion from the original vault. If the result dict is empty, no lifecycle update is needed. Warnings: - Before calling describe_recovery_point , use tzset to set the local time zone to UTC, for correct results. - For safety, this function works in UTC whole days, stripping time and leaving a whole-day margin ( +1 and strict < inequality ). AWS Backup measures lifecycles (MoveToColdStorageAfterDays, DeleteAfterDays) in whole days, but CreationDate -- misnamed -- includes a precise time, and then deletion occurs "at a randomly chosen point over the following 8 hours". https://docs.aws.amazon.com/aws-backup/latest/devguide/recov-point-create-on-demand-backup.html """ kwargs_out = {} lifecycle = dict(describe_resp.get("Lifecycle", {})) # Update the copy... creation_date = describe_resp["CreationDate"].date() days_old = (today_date - creation_date).days + 1 delete_after_days_minima = [days_old, 1, NEW_DELETE_AFTER_DAYS] delete_after_days_maximum = lifecycle.get("DeleteAfterDays") # Don't delay storage_class = describe_resp.get("StorageClass") cold_storage_after_days = ( lifecycle.get("MoveToColdStorageAfterDays") if lifecycle.get("OptInToArchiveForSupportedResources", False) else None ) if storage_class == "DELETED": delete_after_days_maximum = 0 elif cold_storage_after_days is not None: if (storage_class == "WARM") and (days_old < cold_storage_after_days): # Has not yet transitioned cold storage, and is not scheduled to, soon lifecycle.update({ "OptInToArchiveForSupportedResources": False, "MoveToColdStorageAfterDays": -1, }) else: # Has already transitioned cold storage, or is scheduled to, soon delete_after_days_minima.append(cold_storage_after_days + 90) elif storage_class == "COLD": # In case AWS Backup someday supports creation in/non-scheduled move to # cold storage, could have entered cold storage as late as today delete_after_days_minima.append(days_old + 90) delete_after_days = max(delete_after_days_minima) + 1 if ( (delete_after_days_maximum is None) or (delete_after_days < delete_after_days_maximum) ): lifecycle["DeleteAfterDays"] = delete_after_days kwargs_out = {"Lifecycle": lifecycle} return kwargs_out def lambda_handler_copy(event, context): # pylint: disable=unused-argument """Copy a backup to a vault in another AWS account OR another region """ backup = Backup.new(event) backup.do_action( "start_copy_job", { "RecoveryPointArn": backup.arn, "IdempotencyToken": backup.from_job_id, } ) def lambda_handler_update_lifecycle(event, context): # pylint: disable=unused-argument """Schedule deletion of a backup that has been copied to another vault Warning: - Before calling describe_recovery_point , use tzset to set the local time zone to UTC, for correct results. """ backup = Backup.new(event) kwargs_operand = {"RecoveryPointArn": backup.from_backup_arn} describe_resp = backup.do_action("describe_recovery_point", kwargs_operand) if boto3_success(describe_resp): kwargs_lifecycle = get_update_lifecycle_kwargs( describe_resp, datetime.date.today() ) if kwargs_lifecycle: backup.do_action( "update_recovery_point_lifecycle", kwargs_lifecycle | kwargs_operand, validate_backup=False ) # ZIPFILE_END UpdateLifecycleLambdaFn: Type: AWS::Lambda::Function Condition: NotInBackupAccount Properties: Role: !GetAtt UpdateLifecycleLambdaFnRole.Arn Timeout: !Ref UpdateLifecycleLambdaFnTimeoutSecs MemorySize: !Ref UpdateLifecycleLambdaFnMemoryMB LoggingConfig: LogGroup: !Ref UpdateLifecycleLambdaFnLogGrp LogFormat: JSON SystemLogLevel: WARN ApplicationLogLevel: !Ref LogLevel Architectures: - arm64 Runtime: python3.13 # To avoid making users build a source bundle and distribute it to a # bucket in every target region (an AWS Lambda requirement when using # S3), supply shared, multi-handler source code in-line... Environment: Variables: NEW_DELETE_AFTER_DAYS: !Ref NewDeleteAfterDays COPY_ROLE_ARN: "" # Not needed by this handler BACKUP_VAULT_NAME: !Ref VaultName DESTINATION_BACKUP_VAULT_ARN: "" # Not needed by this handler Handler: index.lambda_handler_update_lifecycle Code: ZipFile: | #!/usr/bin/env python3 """Cross-account, cross-region backups with AWS Backup and EventBridge github.com/sqlxpert/backup-events-aws GPLv3 Copyright Paul Marcelin """ import os import logging import time import json import datetime import botocore import boto3 logger = logging.getLogger() # Skip "credentials in environment" INFO message, unavoidable in AWS Lambda: logging.getLogger("botocore").setLevel(logging.WARNING) os.environ["TZ"] = "UTC" time.tzset() # See get_update_lifecycle_kwargs and lambda_handler_update_lifecycle NEW_DELETE_AFTER_DAYS = int(os.environ["NEW_DELETE_AFTER_DAYS"]) def get_backup_action_kwargs_base(): """Get base kwargs for AWS Backup methods, from environment variables """ backup_vault_name = os.environ["BACKUP_VAULT_NAME"] return { "DEFAULT": {"BackupVaultName": backup_vault_name}, "start_copy_job": { "IamRoleArn": os.environ["COPY_ROLE_ARN"], "SourceBackupVaultName": backup_vault_name, "DestinationBackupVaultArn": os.environ["DESTINATION_BACKUP_VAULT_ARN"], }, } def log(entry_type, entry_value, log_level=logging.INFO): """Emit a JSON-format log entry """ entry_value_out = json.loads(json.dumps(entry_value, default=str)) # Avoids "Object of type datetime is not JSON serializable" in # https://github.com/aws/aws-lambda-python-runtime-interface-client/blob/9efb462/awslambdaric/lambda_runtime_log_utils.py#L109-L135 # # The JSON encoder in the AWS Lambda Python runtime isn't configured to # serialize datatime values in responses returned by AWS's own Python SDK! # # Alternative considered: # https://docs.powertools.aws.dev/lambda/python/latest/core/logger/ logger.log( log_level, "", extra={"type": entry_type, "value": entry_value_out} ) def boto3_success(resp): """Take a boto3 response, return True if result was success Success means an AWS operation has started, not necessarily that it has completed. For example, it may take hours to copy a backup. """ return ( isinstance(resp, dict) and isinstance(resp.get("ResponseMetadata"), dict) and (resp["ResponseMetadata"].get("HTTPStatusCode") == 200) ) class Backup(): """AWS Backup recovery point If the object is not also a subclass, then the recovery point is the original, from start_backup_job : https://docs.aws.amazon.com/aws-backup/latest/devguide/eventbridge.html#backup-job-state-change-completed """ _boto3_client = None action_kwargs_base = get_backup_action_kwargs_base() _from_job_id_key = "backupJobId" def __init__(self, from_event): self._from_event = from_event @staticmethod def new(from_event): """Create a Backup or BackupCopy instance Takes a start_backup_job or start_copy_job state change COMPLETED event (EventBridge filters in CloudFormation allow only acceptable events) """ new_object_class = ( BackupCopy if from_event["detail-type"].startswith("Copy ") else Backup ) return new_object_class(from_event) @classmethod def get_boto3_client(cls): """Create (if needed) and return a boto3 client for the AWS Backup service boto3 method references can only be resolved at run-time, against an instance of an AWS service's Client class. http://boto3.readthedocs.io/en/latest/guide/events.html#extensibility-guide Alternatives considered: https://github.com/boto/boto3/issues/3197#issue-1175578228 https://github.com/aws-samples/boto-session-manager-project """ if cls._boto3_client is None: cls._boto3_client = boto3.client( "backup", config=botocore.config.Config(retries={"mode": "standard"}) ) return cls._boto3_client # pylint: disable=missing-function-docstring @property def _from_job_details(self): return self._from_event.get("detail", {}) @property def from_event(self): return self._from_event @property def from_job_id(self): return self._from_job_details.get(self._from_job_id_key, "") @property def from_backup_arn(self): # Reserve from_rsrc_arn for original backups return "" @property def arn(self): return self._from_event.get("resources", [""])[0] # pylint: enable=missing-function-docstring def valid(self): """Return True if all required attributes are non-empty A cursory validation, but EventBridge filters in CloudFormation allow only acceptable events, which come from AWS Backup. """ return all([self.from_job_id, self.arn]) # More attributes coming! def log_action(self, action_name, action_kwargs, exception=None, resp=None): """Log the AWS Lambda event and the outcome of an action on a backup """ log_level = logging.INFO if boto3_success(resp) else logging.ERROR log("LAMBDA_EVENT", self.from_event, log_level) log(f"{action_name.upper()}_KWARGS", action_kwargs, log_level) if exception is not None: log("EXCEPTION", exception, log_level) elif resp is not None: log("AWS_RESPONSE", resp, log_level) def do_action( self, action_name, kwargs_add={}, validate_backup=True ): # pylint: disable=dangerous-default-value """Take an AWS Backup method and kwargs, log outcome, and return response """ action_kwargs = self.action_kwargs_base.get( action_name, self.action_kwargs_base["DEFAULT"] ) | kwargs_add # Copy, don't update! resp = None if validate_backup and not self.valid(): self.log_action(action_name, action_kwargs) else: action_method = getattr(self.get_boto3_client(), action_name) try: resp = action_method(**action_kwargs) except Exception as misc_exception: self.log_action(action_name, action_kwargs, exception=misc_exception) raise self.log_action(action_name, action_kwargs, resp=resp) return resp class BackupCopy(Backup): """AWS Backup recovery point copy, from start_copy_job https://docs.aws.amazon.com/aws-backup/latest/devguide/eventbridge.html#copy-job-state-change-completed Why didn't AWS use the same structure and keys for start_backup_job and the destination half of start_copy_job ? Both methods put a backup into a destination vault! """ _from_job_id_key = "copyJobId" @property def from_backup_arn(self): # Reserve from_rsrc_arn for original backups return self._from_event.get("resources", [""])[0] @property def arn(self): # pylint: disable=missing-function-docstring return self._from_job_details.get("destinationRecoveryPointArn", "") def valid(self): """Return True if all required attributes are non-empty A cursory validation, but EventBridge filters in CloudFormation allow only acceptable events, which come from AWS Backup. """ return all([self.from_job_id, self.from_backup_arn, self.arn]) def get_update_lifecycle_kwargs(describe_resp, today_date): """Take a describe response, return update_recovery_point_lifecycle kwargs Sets/reduces DeleteAfterDays, so a backup that has been copied to another vault can be scheduled for deletion from the original vault. If the result dict is empty, no lifecycle update is needed. Warnings: - Before calling describe_recovery_point , use tzset to set the local time zone to UTC, for correct results. - For safety, this function works in UTC whole days, stripping time and leaving a whole-day margin ( +1 and strict < inequality ). AWS Backup measures lifecycles (MoveToColdStorageAfterDays, DeleteAfterDays) in whole days, but CreationDate -- misnamed -- includes a precise time, and then deletion occurs "at a randomly chosen point over the following 8 hours". https://docs.aws.amazon.com/aws-backup/latest/devguide/recov-point-create-on-demand-backup.html """ kwargs_out = {} lifecycle = dict(describe_resp.get("Lifecycle", {})) # Update the copy... creation_date = describe_resp["CreationDate"].date() days_old = (today_date - creation_date).days + 1 delete_after_days_minima = [days_old, 1, NEW_DELETE_AFTER_DAYS] delete_after_days_maximum = lifecycle.get("DeleteAfterDays") # Don't delay storage_class = describe_resp.get("StorageClass") cold_storage_after_days = ( lifecycle.get("MoveToColdStorageAfterDays") if lifecycle.get("OptInToArchiveForSupportedResources", False) else None ) if storage_class == "DELETED": delete_after_days_maximum = 0 elif cold_storage_after_days is not None: if (storage_class == "WARM") and (days_old < cold_storage_after_days): # Has not yet transitioned cold storage, and is not scheduled to, soon lifecycle.update({ "OptInToArchiveForSupportedResources": False, "MoveToColdStorageAfterDays": -1, }) else: # Has already transitioned cold storage, or is scheduled to, soon delete_after_days_minima.append(cold_storage_after_days + 90) elif storage_class == "COLD": # In case AWS Backup someday supports creation in/non-scheduled move to # cold storage, could have entered cold storage as late as today delete_after_days_minima.append(days_old + 90) delete_after_days = max(delete_after_days_minima) + 1 if ( (delete_after_days_maximum is None) or (delete_after_days < delete_after_days_maximum) ): lifecycle["DeleteAfterDays"] = delete_after_days kwargs_out = {"Lifecycle": lifecycle} return kwargs_out def lambda_handler_copy(event, context): # pylint: disable=unused-argument """Copy a backup to a vault in another AWS account OR another region """ backup = Backup.new(event) backup.do_action( "start_copy_job", { "RecoveryPointArn": backup.arn, "IdempotencyToken": backup.from_job_id, } ) def lambda_handler_update_lifecycle(event, context): # pylint: disable=unused-argument """Schedule deletion of a backup that has been copied to another vault Warning: - Before calling describe_recovery_point , use tzset to set the local time zone to UTC, for correct results. """ backup = Backup.new(event) kwargs_operand = {"RecoveryPointArn": backup.from_backup_arn} describe_resp = backup.do_action("describe_recovery_point", kwargs_operand) if boto3_success(describe_resp): kwargs_lifecycle = get_update_lifecycle_kwargs( describe_resp, datetime.date.today() ) if kwargs_lifecycle: backup.do_action( "update_recovery_point_lifecycle", kwargs_lifecycle | kwargs_operand, validate_backup=False ) # ZIPFILE_END