Skip to content

feat: Bedrock cost attribution — session tags, request metadata, and operator FinOps guidance (#215)#521

Merged
krokoko merged 8 commits into
mainfrom
feat/215-bedrock-cost-attribution
Jun 30, 2026
Merged

feat: Bedrock cost attribution — session tags, request metadata, and operator FinOps guidance (#215)#521
krokoko merged 8 commits into
mainfrom
feat/215-bedrock-cost-attribution

Conversation

@krokoko

@krokoko krokoko commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Summary

Implements Bedrock cost attribution (#215): attribute Bedrock model-inference spend per user and repo, complementing the in-app cost_usd meter and #211 per-session tenant isolation.

The key architectural fact driving the design: Bedrock is invoked by the Claude Code CLI subprocess (CLAUDE_CODE_USE_BEDROCK=1), not by the agent's boto3. So attribution cannot be wired through aws_session.py (which scopes DynamoDB/S3 tenant data) — both levers live in Claude Code's own configuration, set by the agent before it spawns the subprocess.

Track Mechanism Surfaces in
1 — IAM session-tag chargeback bedrock:InvokeModel* granted to the existing AgentSessionRole; Claude Code's awsCredentialExport runs bedrock_creds_helper.py, which assumes that role with {user_id, repo, task_id} STS session tags AWS Cost Explorer / CUR 2.0 (iamPrincipal/ prefix), aggregated
2 — per-call forensics X-Amzn-Bedrock-Request-Metadata set via ANTHROPIC_CUSTOM_HEADERS on the subprocess env Bedrock model-invocation logs (requestMetadata field), per call
3 — operator guide docs/guides/COST_ATTRIBUTION.md + cross-links

Tracks 1 and 2 are complementary (per AWS docs): session tags give aggregated billing chargeback (they are not written to invocation logs); request metadata gives per-call detail in logs (it is not a cost-allocation tag). You need both.

What was implemented

Track 1 — session-tagged credentials

  • cdk/src/constructs/agent-session-role.ts — new optional invokableModels prop. Each invokable is grantInvoke-ed to the SessionRole — the same grant the compute role receives, reused (not hand-rolled ARNs) so cross-region inference profiles fan out to every routed region and can't AccessDenied. No aws:PrincipalTag condition (the tags are for billing, not access scoping). Scoped to explicit model/profile ARNs — never Resource:'*'.
  • cdk/src/stacks/agent.ts — builds the invokable set in one loop (rebased onto feat(cdk): single source of truth for invocable Bedrock models, context-overridable (#433) #434's resolveBedrockModelIds), grants both the runtime and the SessionRole from the same list so the two grants can't drift.
  • agent/src/bedrock_creds_helper.py (new) — invoked by awsCredentialExport. Reads a 0600 file (SessionRole ARN + tags), assumes the role, emits {"Credentials":{…,"Expiration":…}}. The real Expiration drives Claude Code's pre-expiry refresh, beating the 1 h role-chaining cap on long tasks. Fails open to ambient compute-role credentials (this is a billing/observability control, not tenant isolation — contrast aws_session.py's fail-closed path).
  • agent/managed-settings.json + DockerfileawsCredentialExport lives in root-owned /etc/claude-code/managed-settings.json (copied before USER agent). Highest-precedence settings tier, loaded regardless of setting_sources=["project"], so the untrusted cloned repo cannot override it (it runs an arbitrary command — putting it anywhere the repo can influence would be RCE with the compute role).
  • agent/src/aws_session.py — extracted build_session_tags() so the in-process tenant path and the out-of-process Bedrock helper mint identical tags from one definition.

Track 2 — per-request metadata

  • agent/src/runner.py_setup_bedrock_cost_attribution() writes the attribution file and sets ANTHROPIC_CUSTOM_HEADERS on the process env (deliberate, documented exception to the "tenant ids out of os.environ" rule — the values are self-referential and non-secret; json.dumps escaping blocks header injection).

Track 2 prerequisite fix — model-invocation logging now actually enables on deploy

Found during live verification: the ModelInvocationLogging custom resource sent largeDataDeliveryS3Config with an empty bucketName, which Bedrock rejects client-side (ValidationException, min length: 3). With ignoreErrorCodesMatching: '.*' swallowing it and onUpdate never re-firing (static props), a fresh deploy silently left model-invocation logging disabled — so Bedrock recorded no requestMetadata and Track 2 produced nothing to query.

  • Omit largeDataDeliveryS3Config entirely (optional; only for S3 large-data delivery, unused here).
  • Narrow ignoreErrorCodesMatching from .* to transient service errors (Throttling/ServiceUnavailable/InternalServer) so a client-side misconfiguration fails the deploy loudly instead of disabling logging silently.
  • Grant iam:PassRole on BedrockLoggingRole to the custom resource's role. PutModelInvocationLoggingConfiguration hands that role to the Bedrock service (to write the log group), so the caller needs PassRole on it. This was a second latent bug also masked by the .* ignore — narrowing the ignore made it surface and fail the deploy (as intended); fixed here, scoped to the one role ARN (not a wildcard).

Observability hardening (from review)

  • agent/src/bedrock_creds_helper.py — every fail-open path now logs to stderr (stdout is the credential channel Claude Code parses). Distinguishes severities: absent file (benign) vs present-but-unreadable (a write bug); expected ClientError/BotoCoreError assume failure vs UNEXPECTED errors; ImportError on boto3 (packaging defect). All still fail open — but a persistent degradation is now visible, not invisible.

Version alignment

  • claude-agent-sdk==0.2.110 (pyproject) ↔ npm @anthropic-ai/claude-code@2.1.191 (Dockerfile) pinned in lockstep — the SDK bundles a CLI and both must agree on the control protocol; 2.1.191 also has the awsCredentialExport-with-Expiration refresh behavior the design relies on.

Docs

  • New docs/design/BEDROCK_COST_ATTRIBUTION.md and docs/guides/COST_ATTRIBUTION.md (operator FinOps guide), cross-linked from COST_MODEL.md and DEPLOYMENT_GUIDE.md. Includes a prominent warning that in-app cost_usd is a client-side SDK estimate, not authoritative billing (mirroring the Claude Agent SDK cost-tracking caveat, adapted for Bedrock → authoritative source is AWS Cost Explorer/CUR), the correct (post-deploy, non-pre-activatable) cost-allocation-tag ordering, and how to verify/re-enable model-invocation logging. Starlight mirrors synced.

What was tested

Automated — full suites green:

  • CDK: 122 suites / 2211 tests pass. New: agent-session-role.test.ts asserts the Bedrock grant is present (scoped, no Resource:'*') when invokableModels is set and absent when omitted; agent.test.ts regression guards that the logging custom resource never sends largeDataDeliveryS3Config, never uses a catch-all error ignore, and grants iam:PassRole on the logging role.
  • Agent: 1100 tests pass, 79.7% coverage (gate 72%). test_bedrock_creds_helper.py covers the tagged assume + session name, all fail-open paths (absent/corrupt config, ClientError, unexpected error, no-creds), 0600 file mode, and the stderr diagnostics; test_runner.py covers attribution-file write + header assembly.
  • Lint (ruff, eslint) clean; cdk synth clean (cdk-nag passes).

Manual review — ran the PR-review toolkit (code-reviewer, silent-failure-hunter) and a security review. No CRITICAL/HIGH findings. The silent-failure review drove the observability hardening above. Security review verified the RCE boundary (root-owned managed-settings, repo can't override), IAM least-privilege (scoped grant, no wildcard), 0600 atomic write, no secret logging, and that json.dumps defeats header injection.

Live verification (deployed dev stack, us-east-1): a real agent task's Bedrock calls show all three metadata fields in the invocation logs, signed by the session-tagged role — proving both tracks end-to-end (Track 2 via requestMetadata, Track 1 via the abca-bedrock-<task_id> session ARN). This also resolves the one risk flagged in the design as unverified: Claude Code does sign the X-Amzn-Bedrock-Request-Metadata header. Redacted sample log record:

{
  "requestMetadata": {
    "user_id": "<redacted-cognito-sub>",
    "repo": "<owner>/<repo>",
    "task_id": "<task-ulid>"
  },
  "modelId": "arn:aws:bedrock:us-east-1:<account>:inference-profile/us.anthropic.claude-sonnet-4-6",
  "identity": {
    "arn": "arn:aws:sts::<account>:assumed-role/<stack>-AgentSessionRole<id>/abca-bedrock-<task-ulid>"
  }
}

Clean-deploy verification of the logging fixes: I reset the account's model-invocation logging config to empty, then ran cdk deploy of this branch. The ModelInvocationLogging custom resource went UPDATE_COMPLETE (previously UPDATE_FAILED), and the live config came back enabled by the deploy itself (pointing at the stack's own log group + BedrockLoggingRole) — confirming the full chain end-to-end: empty-bucket error removed → masking narrowed → iam:PassRole granted. Stack UPDATE_COMPLETE.

Note: the cost-allocation-tag activation (Cost Explorer / CUR side) remains an operator step documented in the guide and cannot be pre-activated — the tag keys only appear after the first tagged call.

Notes for reviewers

Closes #215

bgagent added 7 commits June 30, 2026 16:05
Design for per-user/per-repo Bedrock spend attribution. Key finding:
Bedrock is invoked by the Claude Code CLI subprocess, not the agent's
boto3, so both tracks (IAM session tags + request metadata) are wired
via Claude Code config (awsCredentialExport, ANTHROPIC_CUSTOM_HEADERS)
and a new BedrockInvokeRole — not by extending aws_session.py.

Refs #215
…ata (#215)

Attribute Bedrock model-inference spend per user/repo. Bedrock is invoked
by the Claude Code subprocess (CLAUDE_CODE_USE_BEDROCK=1), so attribution is
wired through Claude Code's config, not the agent's boto3.

Track 1 — IAM session-tag chargeback (CUR 2.0 / Cost Explorer):
- Grant bedrock:InvokeModel* on the existing AgentSessionRole (reuse, not a
  new role) via grantInvoke, mirroring the compute-role grant exactly so
  cross-region profiles never AccessDenied. Compute role keeps its grant.
- bedrock_creds_helper.py assumes the SessionRole with {user_id,repo,task_id}
  STS tags and emits creds JSON for Claude Code's awsCredentialExport, which
  refreshes before the 1h role-chaining cap. Fails OPEN to ambient creds
  (billing control, not isolation). awsCredentialExport lives in root-owned
  /etc/claude-code/managed-settings.json so the untrusted repo can't override
  it (RCE boundary).

Track 2 — per-call forensics (model-invocation logs):
- Set X-Amzn-Bedrock-Request-Metadata via ANTHROPIC_CUSTOM_HEADERS on the
  subprocess env (one container = one task, so static-per-process is per-task;
  process-env so the repo can't alter it). SigV4 signed-headers behavior to be
  validated live (AC#3 documented-blocker path).

Track 3 — operator guide COST_ATTRIBUTION.md + cross-links, plus a prominent
warning that in-app cost_usd is a client-side SDK estimate (authoritative
source is AWS Cost Explorer / CUR 2.0), mirroring the Claude Agent SDK
cost-tracking caveat.

Align claude-agent-sdk 0.2.110 (bundles CLI 2.1.191) with the npm CLI pin.

Tests: CDK Bedrock grant present/absent; helper assume + fail-open paths;
runner file+header wiring. #211 tenant-isolation path untouched.

Refs #215
)

PR #434 replaces the six named model/profile bindings in agent.ts with a
loop over a single source-of-truth id list. Our #215 SessionRole grant
referenced those bindings by name, so the merge would break compilation.

Adopt #434's loop+collection shape now: build each foundation model + its
cross-region profile in a loop, grant the runtime, and collect into one
list passed to AgentSessionRole.invokableModels. Behavior is byte-for-byte
identical in synth; the eventual #434 merge becomes a one-line swap of the
local id array for resolveBedrockModelIds(this.node).

Refs #215, #434
…review)

Silent-failure review flagged that bedrock_creds_helper.py degraded silently:
a persistent assume-role denial would drop chargeback for weeks with no signal
pointing back to this code — the 'invisible degradation' AI004 forbids even
when the fallback itself is intended.

- Add _warn() (stderr only — stdout is the credential channel Claude Code
  parses, so shell.log/fd1 is unusable here).
- Log every fail-open path; distinguish severities: absent file (benign) vs
  present-but-unreadable (write bug), and expected ClientError/BotoCoreError
  assume failure vs UNEXPECTED errors.
- Narrow the assume catch to (ClientError, BotoCoreError); catch ImportError on
  boto3 separately (packaging defect, not AccessDenied). All still fail open.

Behavior unchanged (still fail-open to ambient creds); degradations are now
visible and correlatable. Tests cover each distinguished path + its diagnostic.

Refs #215
Security review (LOW/accepted): unlike tenant-data tags, the request-metadata
header lives on os.environ because Claude Code reads it from there. Document
why that's safe (self-referential non-secret values; json.dumps escaping blocks
header injection) in both the code and the design doc, so it reads as intent
rather than an oversight against the 'tenant ids out of os.environ' discipline.

Refs #215
The IAM-principal tag keys can't be pre-activated — they only appear in the
Billing console after the platform makes tagged Bedrock calls. Fix the ordering
(deploy → run task → wait ≤24h → activate), point to Billing → Cost allocation
tags (not Tag Editor / Resource Groups, which lists resource types), and note
the capability may not be enabled in every account/region yet.

Refs #215
The ModelInvocationLogging custom resource sent largeDataDeliveryS3Config
with an empty bucketName. Bedrock rejects that client-side (ValidationException,
'min length: 3'), and ignoreErrorCodesMatching: '.*' swallowed it while onUpdate
never re-fired (static props) — so a fresh deploy silently left model-invocation
logging DISABLED, and Bedrock recorded no requestMetadata (#215 Track 2 produced
nothing to query). Found during live verification of task 01KWD7S....

- Omit largeDataDeliveryS3Config entirely (optional; only for S3 large-data
  delivery, which this stack doesn't use). The 'required by API schema' comment
  was wrong.
- Narrow ignoreErrorCodesMatching from '.*' to transient service errors only
  (Throttling/ServiceUnavailable/InternalServer) so a client-side
  misconfiguration fails the deploy loudly instead of disabling logging silently.
- Tests: assert the CR never sends largeDataDeliveryS3Config and never uses a
  catch-all error ignore.
- Docs: COST_ATTRIBUTION.md now tells operators to verify logging is on in the
  agent's Region (get-model-invocation-logging-configuration) and how to
  re-enable it, since metadata is only recorded when logging is active.

Verified live: with logging on, invocation logs show requestMetadata.{user_id,
repo,task_id} and the abca-bedrock-<task_id> session ARN — Tracks 1 and 2 both
confirmed working end-to-end.

Refs #215
@krokoko krokoko requested review from a team as code owners June 30, 2026 22:01
isadeks
isadeks previously approved these changes Jun 30, 2026
…source (#215)

With the empty-bucket validation error fixed, PutModelInvocationLoggingConfiguration
now actually reaches Bedrock at deploy — and fails because the custom resource's
Lambda role lacks iam:PassRole on BedrockLoggingRole (the role it hands to the
Bedrock service to write the log group). This was masked by the earlier
client-side ValidationException that ignoreErrorCodesMatching: '.*' swallowed.

Add iam:PassRole scoped to the BedrockLoggingRole ARN (not a wildcard). Test
asserts the grant is present.

Refs #215
@krokoko krokoko enabled auto-merge June 30, 2026 22:17
@krokoko krokoko added this pull request to the merge queue Jun 30, 2026
Merged via the queue into main with commit 53a13cb Jun 30, 2026
8 of 9 checks passed
@krokoko krokoko deleted the feat/215-bedrock-cost-attribution branch June 30, 2026 22:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Bedrock cost attribution — session tags, request metadata, and operator FinOps guidance

2 participants