diff --git a/docs/abca-plugin/skills/onboard-repo/SKILL.md b/docs/abca-plugin/skills/onboard-repo/SKILL.md index 7abc3caf..3f7e505a 100644 --- a/docs/abca-plugin/skills/onboard-repo/SKILL.md +++ b/docs/abca-plugin/skills/onboard-repo/SKILL.md @@ -1,139 +1,152 @@ --- name: onboard-repo description: >- - Onboard a new GitHub repository to the ABCA platform by adding a Blueprint CDK - construct. Use when the user says "onboard a repo", "add a repository", - "register a repo", "new repo", "Blueprint construct", "REPO_NOT_ONBOARDED error", - or gets a 422 error about an unregistered repository. + Onboard a new GitHub repository to the ABCA platform so the agent can target it. + Use when the user says "onboard a repo", "add a repository", "register a repo", + "new repo", or gets a `REPO_NOT_ONBOARDED` / 422 error about an unregistered + repository. --- # Repository Onboarding -You are guiding the user through onboarding a new GitHub repository to ABCA. Repositories must be registered as `Blueprint` constructs in the CDK stack before tasks can target them. +You are helping an **operator** register a GitHub repository with their running ABCA +deployment so tasks can target it. -## Step 1: Gather Repository Details +There are two paths. -Use AskUserQuestion to collect: -- **Repository**: GitHub `owner/repo` format -- **Compute type**: `agentcore` (default) or `ecs` -- **Model preference**: Claude Sonnet 4 (default), Claude Opus 4 (complex repos), or Claude Haiku (lightweight). **Important:** Models must be specified using their cross-region inference profile ID (e.g. `us.anthropic.claude-opus-4-20250514-v1:0`), not the raw foundation model ID. On-demand invocation of raw model IDs is not supported for most models. -- **Max turns**: Default 100 (range: 1-500) -- **Max budget**: USD cost ceiling per task (optional) -- **Custom GitHub PAT**: If this repo needs a different token than the platform default +**Prefer the CLI operator path (Path A)** when the repo can run on the +**platform/default-blueprint** setup — the default GitHub token secret, a model +already granted to the runtime, and the default egress allowlist. It's a single +runtime command against the deployed stack: no code change, no redeploy. -## Step 2: Read the Current Stack +**Use the CDK Blueprint path (Path B)** when the repo needs its **own** config that +the CLI can't provision at runtime — a per-repo GitHub token, a model not yet +granted to the runtime, custom egress domains, Cedar HITL policies, or +system-prompt overrides. These are baked into infrastructure and require a redeploy +(with the correct permissions). When in doubt, start with Path A; if a task later +fails on a missing token / model grant / blocked egress, promote the repo to a +Blueprint. -Read the CDK stack file to understand existing Blueprint definitions: +> **This is an operation, not a contribution.** Onboarding a repo into your own +> deployment writes a record to the platform's RepoTable — it is **not** a change to +> the `aws-samples` codebase, so the ADR-003 contribution flow (GitHub issue → +> approval → feature branch) does **not** apply. Only invoke ADR-003 if the user is +> actually changing the platform source (e.g. wiring a brand-new Bedrock model into +> the stack — see "Model not yet wired into the runtime" below). +## Path A — CLI operator onboarding (default) + +`bgagent repo onboard` writes (or re-activates) the repository's `RepoConfig` row in +the deployed RepoTable directly. It takes effect immediately — **no `agent.ts` edit, +no `cdk deploy`.** + +```bash +bgagent repo onboard +# common overrides: +# --model e.g. us.anthropic.claude-sonnet-4-6 +# --compute-type +# --max-turns +# --token-secret-arn per-repo GitHub token (else platform default) ``` -Read cdk/src/stacks/agent.ts + +Then confirm it landed: + +```bash +bgagent repo list # status should be "active" +bgagent repo show # full resolved config (secret ARNs redacted) ``` -Identify: -- Where existing Blueprint constructs are defined -- The `repoTable` reference used -- Any patterns for compute/model overrides +That's it — the repo is onboarded. Submit a task with the `submit-task` skill. -### Sample blueprint repo without a code change +**Pick a model that is already wired into the runtime.** With no `--model`, the repo +uses the platform default (Sonnet 4.6). If you pass `--model`, use a cross-Region +**inference profile ID** (e.g. `us.anthropic.claude-sonnet-4-6`), not a raw +`anthropic.*` foundation-model ID. Only models the stack has granted the runtime can +be invoked — see "Model not yet wired into the runtime" before choosing a model the +deployment doesn't already support. -The stack’s **AgentPlugins** blueprint uses a `repo` value resolved in this order: **`BLUEPRINT_REPO`** (environment variable) → CDK context **`blueprintRepo`** → default `awslabs/agent-plugins` (see `blueprintRepo` in `cdk/src/stacks/agent.ts`). If the user only needs to target a fork of the sample repo, they can set `export BLUEPRINT_REPO=owner/repo` or pass `-c blueprintRepo=owner/repo` (or set `"context": { "blueprintRepo": "..." }` in `cdk/cdk.json`) and redeploy, instead of adding a new `Blueprint` construct. +## Path B — CDK Blueprint (declarative / canonical) -## Step 3: Add the Blueprint Construct +Use this when the operator wants the repo committed to infrastructure-as-code (so a +fresh deploy re-creates it) rather than set as a runtime record. This **does** require +editing the stack and redeploying. -Add a new `Blueprint` construct instance to the stack. Follow the existing pattern. Example: +1. Read `cdk/src/stacks/agent.ts` to find where `Blueprint` constructs are defined and + the `repoTable` reference. +2. Add a construct following the existing pattern: -```typescript -new Blueprint(this, 'MyRepoBlueprint', { - repo: 'owner/repo', - repoTable: repoTable.table, - // Optional overrides: - // computeType: 'agentcore', - // modelId: 'us.anthropic.claude-sonnet-4-20250514-v1:0', - // maxTurns: 100, - // maxBudgetUsd: 50, - // runtimeArn: runtime.runtimeArn, - // githubTokenSecretArn: 'arn:aws:secretsmanager:...', -}); -``` + ```typescript + new Blueprint(this, 'MyRepoBlueprint', { + repo: 'owner/repo', + repoTable: repoTable.table, + // Optional overrides: + // computeType: 'agentcore', + // modelId: 'us.anthropic.claude-sonnet-4-6', + // maxTurns: 100, + // maxBudgetUsd: 50, + // githubTokenSecretArn: 'arn:aws:secretsmanager:...', + }); + ``` -Use a descriptive construct ID derived from the repo name. +3. Redeploy: `mise //cdk:compile` → `mise //cdk:diff` (show the diff) → `mise //cdk:deploy -- --require-approval never`. -### Model ID and IAM Permissions +> **Sample-repo shortcut:** the stack's AgentPlugins blueprint resolves its `repo` from +> `BLUEPRINT_REPO` (env) → CDK context `blueprintRepo` → default `awslabs/agent-plugins`. +> To target a fork of the sample without adding a construct, set +> `export BLUEPRINT_REPO=owner/repo` (or `cdk.json` context) and redeploy. -When specifying a non-default model via `agent.modelId`, three things are required: +## Model not yet wired into the runtime (the one real code change) -1. **Use the inference profile ID, not the raw model ID, when Bedrock requires it.** For `InvokeModel` / streaming, specify the cross-Region **inference profile** identifier (or ARN) where the Bedrock User Guide calls for it — not only the bare `anthropic.*` foundation model ID. Examples: - - Sonnet 4.6 (US geography profile): `us.anthropic.claude-sonnet-4-6` - - Sonnet 4: `us.anthropic.claude-sonnet-4-20250514-v1:0` - - Opus 4: `us.anthropic.claude-opus-4-20250514-v1:0` - - Haiku 4.5: `us.anthropic.claude-haiku-4-5-20251001-v1:0` +A repo can only use a model the **runtime IAM role has `grantInvoke` for**. As of now +the stack wires **Sonnet 4.6, Opus 4 (`claude-opus-4-20250514`), and Haiku 4.5** (see +the `grantInvoke` block in `agent.ts`). Onboarding a repo pinned to any **other** model +(e.g. Opus 4.8 / `us.anthropic.claude-opus-4-8`) will fail at invoke with a 403 — the +CLI onboard succeeds, but tasks can't run. - See [Use an inference profile in model invocation](https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-use.html). +Adding a new model **is** a platform source change, so it follows ADR-003 (issue → +approval → feature branch) and requires: -2. **Grant the runtime IAM permissions for the model.** The Blueprint construct does not automatically grant `bedrock:InvokeModel*` — this is by design (least privilege). You must add a `grantInvoke` block in the stack for each model used: +1. **Wire the model + inference profile and grant the runtime**, in `agent.ts`: ```typescript - const opusModel = new bedrock.BedrockFoundationModel('anthropic.claude-opus-4-20250514-v1:0', { + const model = new bedrock.BedrockFoundationModel('anthropic.claude-opus-4-8', { supportsAgents: true, supportsCrossRegion: true, }); - opusModel.grantInvoke(runtime); - - const opusProfile = bedrock.CrossRegionInferenceProfile.fromConfig({ + model.grantInvoke(runtime); + const profile = bedrock.CrossRegionInferenceProfile.fromConfig({ geoRegion: bedrock.CrossRegionInferenceProfileRegion.US, - model: opusModel, + model, }); - opusProfile.grantInvoke(runtime); + profile.grantInvoke(runtime); ``` + then redeploy. +2. **Account-level Bedrock model access** (separate from IAM): the account must have the + model enabled for the Region — complete [model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html) + prerequisites (Marketplace actions / Anthropic first-time use where applicable). For + cross-Region profiles, IAM and SCPs must allow Bedrock in source **and** destination + Regions. -3. **Account-level Bedrock model access (separate from IAM).** The runtime role must be allowed to invoke the model, and the **AWS account** must be able to use that model in Bedrock: complete [model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html) prerequisites (AWS Marketplace actions on first serverless use where applicable, Anthropic first-time use / `PutUseCaseForModelAccess` for Anthropic models, valid payment method for Marketplace-backed models). For geographic cross-Region inference profiles, IAM and SCPs must allow Bedrock in **source and destination** Regions per [Supported Regions and models for inference profiles](https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-support.html). - -## Step 4: Deploy - -After adding the Blueprint, the stack must be redeployed: - -```bash -export MISE_EXPERIMENTAL=1 -mise //cdk:compile # Verify TypeScript compiles -mise //cdk:test # Run tests -mise //cdk:diff # Preview changes -``` - -Show the diff to the user. If it looks correct, ask if they want to deploy now. - -```bash -mise //cdk:deploy -``` - -## Step 5: Verify - -After deployment, verify the repo config was written to DynamoDB: - -```bash -aws dynamodb scan --table-name \ - --filter-expression "repo = :r" \ - --expression-attribute-values '{":r":{"S":"owner/repo"}}' \ - --output json -``` +If the user just wants the agent working now, steer them to a wired model (Sonnet 4.6) +via Path A and treat "add model X" as a separate, later change. -## Per-Repository Configuration Reference +## Per-repository configuration reference | Setting | Purpose | Default | |---------|---------|---------| | `compute_type` | Execution strategy | `agentcore` | -| `runtime_arn` | AgentCore runtime override | Platform default | -| `model_id` | AI model for tasks | Platform default (Sonnet 4) | +| `model_id` | AI model for tasks (inference profile ID) | Platform default (Sonnet 4.6) | | `max_turns` | Turn limit per task | 100 | | `max_budget_usd` | Cost ceiling per task | Unlimited | | `system_prompt_overrides` | Custom system instructions | None | | `github_token_secret_arn` | Repo-specific GitHub token | Platform default | | `poll_interval_ms` | Completion polling frequency | 30000ms | -Task-level parameters override Blueprint defaults. If neither specifies a value, platform defaults apply. +Task-level parameters override per-repo defaults; if neither specifies a value, platform defaults apply. -## Common Issues +## Common issues -- **422 "Repository not onboarded"** — Blueprint hasn't been deployed yet. Add the construct and redeploy. -- **Preflight failures after onboarding** — GitHub PAT may lack permissions for the new repo. Check the PAT's fine-grained access includes the target repository with Contents (read/write) and Pull requests (read/write) permissions. -- **400 "Invocation with on-demand throughput isn't supported"** — The Blueprint `modelId` is using a raw foundation model ID instead of an inference profile ID. Change e.g. `anthropic.claude-opus-4-20250514-v1:0` to `us.anthropic.claude-opus-4-20250514-v1:0`. -- **403 "not authorized to perform bedrock:InvokeModelWithResponseStream"** — The runtime IAM role lacks permissions for the model specified in the Blueprint. Add `grantInvoke` for both the model and its cross-region inference profile in `agent.ts`. -- **Model not available / "not available on your Bedrock deployment"** — IAM is not the whole story: the account must meet [Bedrock model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html) for that model family and Region, and `modelId` should be an **enabled** inference profile ID (for example `us.anthropic.claude-sonnet-4-6`) where Bedrock requires it. After fixing access in the console, align Blueprint / DynamoDB `model_id` and redeploy if you change IAM grants. +- **`REPO_NOT_ONBOARDED` / 422** — the repo isn't registered. Run `bgagent repo onboard ` (Path A). Confirm the `owner/repo` matches exactly what you pass to `bgagent submit --repo`. +- **Preflight failure after onboarding** — the GitHub PAT lacks access to the new repo. Ensure the token has Contents (read/write) + Pull requests (read/write) on it, or onboard with a repo-specific `--token-secret-arn`. +- **400 "Invocation with on-demand throughput isn't supported"** — `model_id` is a raw foundation-model ID; use the inference-profile ID (e.g. `us.anthropic.claude-sonnet-4-6`). +- **403 "not authorized to perform bedrock:InvokeModelWithResponseStream"** — the repo's model isn't wired into the runtime. See "Model not yet wired into the runtime." +- **Model not available / "not available on your Bedrock deployment"** — account-level Bedrock access isn't enabled for that model/Region (separate from IAM); complete [model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html), then use an enabled inference-profile ID. diff --git a/docs/abca-plugin/skills/setup/SKILL.md b/docs/abca-plugin/skills/setup/SKILL.md index 928431ed..ce693600 100644 --- a/docs/abca-plugin/skills/setup/SKILL.md +++ b/docs/abca-plugin/skills/setup/SKILL.md @@ -48,28 +48,33 @@ Run these steps in order, verifying each: 6. `mise run install` — Install all workspace dependencies 7. `mise run build` — Full monorepo build (agent quality + CDK + CLI + docs) -If `mise run install` fails with "yarn: command not found", Corepack wasn't activated. If `prek install` fails about `core.hooksPath`, another hook manager owns hooks — suggest `git config --unset-all core.hooksPath`. +Common Phase 2 snags to pre-empt (don't let these read as a broken environment): +- "yarn: command not found" → Corepack wasn't activated (step 3). +- `prek install` fails about `core.hooksPath` → another hook manager owns hooks; suggest `git config --unset-all core.hooksPath`. +- Node, Yarn, AND CDK all "not found" at once → expected before `mise install` finishes; mise provisions them. +- `mise install` fails Node on GPG verification (headless/EC2, no gpg-agent) → `mise settings set node.gpg_verify false` (still checksum-verified), retry. +- "config not trusted" for `~/.config/mise/config.toml` → run `mise trust` on the user-global config too, not just the project one. +- In a non-interactive/spawned shell, `mise` may not be on `PATH` → use `~/.local/bin/mise` or `mise exec --`. -## Phase 3: One-Time AWS Setup +## Phase 3: One-Time Host Setup (build architecture) -On a fresh AWS account, X-Ray needs a CloudWatch Logs resource policy before it can write spans. Run both commands — the first creates the policy, the second sets the destination: +The agent image is built for **linux/arm64** (AgentCore runs on Graviton). On an **x86_64** build host this is the most common first-deploy blocker — the image build dies with `exec /bin/sh: exec format error`. Register QEMU emulation once per host: ```bash -ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) -aws logs put-resource-policy \ - --policy-name xray-spans-policy \ - --policy-document "{\"Version\":\"2012-10-17\",\"Statement\":[{\"Sid\":\"XRaySpansAccess\",\"Effect\":\"Allow\",\"Principal\":{\"Service\":\"xray.amazonaws.com\"},\"Action\":[\"logs:PutLogEvents\",\"logs:CreateLogGroup\",\"logs:CreateLogStream\"],\"Resource\":[\"arn:aws:logs:*:${ACCOUNT_ID}:log-group:aws/spans\",\"arn:aws:logs:*:${ACCOUNT_ID}:log-group:aws/spans:*\"]}]}" -aws xray update-trace-segment-destination --destination CloudWatchLogs +docker run --privileged --rm tonistiigi/binfmt --install arm64 ``` -These must be run once per AWS account before first deployment. If the `put-resource-policy` step is skipped, the `update-trace-segment-destination` command fails with `AccessDeniedException`. +If `docker run --privileged` is blocked (security-managed hosts), deploy from a **native arm64 host** (Graviton EC2 / Apple Silicon) instead. On Apple Silicon / arm64 hosts, skip this phase. + +**X-Ray tracing is OPTIONAL — do not gate deployment on it.** The stack ships with X-Ray→CloudWatch-Logs export disabled (`tracingEnabled` in `agent.ts`), so it deploys and runs fully without any X-Ray setup. Do NOT run `aws xray update-trace-segment-destination` as a prerequisite — on a security-managed AWS Org account an SCP can make that call fail with `AccessDeniedException` no matter what, dead-ending the user on a step the platform doesn't use. Mention tracing only as an opt-in extra. ## Phase 4: First Deployment Guide through: 1. `mise //cdk:bootstrap` — Bootstrap CDK (if not already done for this account/region) -2. `mise //cdk:deploy` — Deploy the stack (~9.5 minutes) +2. `mise //cdk:deploy -- --require-approval never` — Deploy the stack (~9.5 minutes). The flag avoids the approval prompt hanging in a non-interactive shell. + - If the deploy rolls back on a missing IAM permission and lands in `ROLLBACK_COMPLETE`, the stack can't be updated — `mise //cdk:destroy` then redeploy. Teardown can stall in `DELETE_FAILED` for ~20–40 min while AgentCore's service-managed (Hyperplane) ENIs are reclaimed; wait, then retry destroy. Never force-delete past stuck VPC resources (orphans the VPC; VPCs are quota-capped per Region). 3. Retrieve stack outputs: ```bash aws cloudformation describe-stacks --stack-name backgroundagent-dev \ diff --git a/docs/guides/QUICK_START.mdx b/docs/guides/QUICK_START.mdx index 719892b5..45e85151 100644 --- a/docs/guides/QUICK_START.mdx +++ b/docs/guides/QUICK_START.mdx @@ -16,12 +16,22 @@ Install these before you begin: - **AWS account** with credentials configured (`aws configure`). If you use named profiles, set `AWS_PROFILE` before running any commands in this guide. - **Amazon Bedrock** — The agent invokes Claude through Bedrock. IAM `grantInvoke` in the CDK stack is required but **not sufficient**: your account must also satisfy [Amazon Bedrock model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html) for the model you use (including Anthropic first-time use where applicable, Marketplace subscription flow on first serverless use, and a valid payment method for Marketplace-backed models). See **Amazon Bedrock before your first task** after Step 3. -- **Docker** - for building the agent container image +- **Docker** - for building the agent container image. On an **x86_64** host you also need QEMU/binfmt to build the arm64 (Graviton) image — see the caution in Step 3. - **Node.js** v20 or later (Node 24 is the supported maximum — see CI matrix) - **mise** - task runner ([install guide](https://mise.jdx.dev/getting-started.html)) - **AWS CDK CLI** - `npm install -g aws-cdk` (after mise is active) - **GitHub account** — You need a [GitHub profile](https://github.com/join) to fork the sample repository and create a **fine-grained personal access token (PAT)** the agent uses to push branches and open pull requests. A free github.com account is sufficient. +:::note[mise provisions Node, Yarn, and the CDK CLI for you] + +If a prerequisite check reports Node, Yarn, **and** CDK as "not found" all at once, your environment isn't broken — mise installs those tool versions in Step 1, so they only appear on `PATH` after `mise install` + shell activation. A few mise gotchas worth knowing up front: + +- **Non-interactive / spawned shells** (CI, scripts, some agent runners) don't pick up mise's shell hook, so `mise`/`node`/`yarn` may not be on `PATH`. Use the full path `~/.local/bin/mise` or prefix commands with `mise exec --`. +- **Headless Linux** (EC2, containers) often lacks a working `gpg-agent`, so `mise install` can fail Node on GPG signature verification. If so, run `mise settings set node.gpg_verify false` (downloads are still checksum-verified) and retry. +- **Two trust prompts**: you may need to `mise trust` both the project `mise.toml` *and* your user-global `~/.config/mise/config.toml` if you change a global setting. + +::: + ## Step 1 - Clone and install This project uses [mise](https://mise.jdx.dev/) to manage tool versions (Node.js, Python, security scanners) and run tasks across the monorepo. Yarn Classic handles JavaScript workspaces (`cdk/`, `cli/`, `docs/`). @@ -46,6 +56,12 @@ mise run build `mise run install` installs all JavaScript and Python dependencies across the monorepo. `mise run build` compiles the CDK app, the CLI, the agent image, and the docs site. A successful build means you are ready to deploy. +:::note[Expect noisy build output — trust the exit code] + +`mise run build` prints many `ERROR`/`WARN` lines, cdk-nag warnings, and sometimes `A worker process has failed to exit gracefully`. These are **benign** — test fixtures deliberately exercise failure paths. The only reliable success signal is the **exit code: `0` means the build passed.** Don't be alarmed by the log volume. + +::: + :::note[ABCA CLI commands in this guide] This Quick Start does **not** assume a global `bgagent` install (`npm install -g …`). Step 1 compiles the CLI into `cli/lib/`. On every **ABCA CLI** tab below, run from the `cli/` directory: @@ -161,22 +177,35 @@ Fine-grained tokens only work for repos you own (or orgs that have opted in). If The CDK stack deploys the full platform: API Gateway, Lambda functions (orchestrator, task CRUD, webhooks), DynamoDB tables, AgentCore Runtime, VPC with network isolation, Cognito user pool, and CloudWatch dashboards. ```bash -# One-time account setup: allow X-Ray to write spans to CloudWatch Logs. -# On a fresh account, X-Ray needs a resource policy before the destination can be set. -ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) -aws logs put-resource-policy \ - --policy-name xray-spans-policy \ - --policy-document "{\"Version\":\"2012-10-17\",\"Statement\":[{\"Sid\":\"XRaySpansAccess\",\"Effect\":\"Allow\",\"Principal\":{\"Service\":\"xray.amazonaws.com\"},\"Action\":[\"logs:PutLogEvents\",\"logs:CreateLogGroup\",\"logs:CreateLogStream\"],\"Resource\":[\"arn:aws:logs:*:${ACCOUNT_ID}:log-group:aws/spans\",\"arn:aws:logs:*:${ACCOUNT_ID}:log-group:aws/spans:*\"]}]}" -aws xray update-trace-segment-destination --destination CloudWatchLogs - # Bootstrap CDK (first time only) mise //cdk:bootstrap -# Deploy the stack (~10 minutes) -mise //cdk:deploy +# Deploy the stack (~10 minutes). --require-approval never lets it run +# unattended; drop the flag if you want to review IAM/security-group changes. +mise //cdk:deploy -- --require-approval never ``` -The X-Ray commands are a one-time per-account setup. On a fresh account the `put-resource-policy` call is required first — without it, the `update-trace-segment-destination` command fails with an `AccessDeniedException` because X-Ray cannot write to the `aws/spans` log group. CDK bootstrap provisions the staging resources CDK needs (S3 bucket, IAM roles). The deploy itself takes around 10 minutes - most of the time is spent building the Docker image and provisioning the AgentCore Runtime. +CDK bootstrap provisions the staging resources CDK needs (S3 bucket, IAM roles). The deploy itself takes around 10 minutes — most of the time is spent building the Docker image and provisioning the AgentCore Runtime. + +:::caution[Building on an x86 host? Register arm64 emulation first.] + +The agent runs on AgentCore (**Graviton / arm64**), so the container image is built for `linux/arm64`. On an **x86_64** build host (most cloud dev boxes, many laptops), the image build fails partway with `exec /bin/sh: exec format error` unless QEMU emulation is registered: + +```bash +docker run --privileged --rm tonistiigi/binfmt --install arm64 +``` + +If `docker run --privileged` is blocked on your host (common on security-managed machines), build from a **native arm64 host** instead (Graviton EC2 or Apple Silicon) — no emulation needed. Apple Silicon and other arm64 hosts can skip this step entirely. Note that the emulated build is noticeably slower than a native one. + +::: + +:::note[X-Ray → CloudWatch Logs tracing is optional — skip it for your first deploy.] + +The stack ships with X-Ray span export to CloudWatch Logs **disabled** (`tracingEnabled` in `cdk/src/stacks/agent.ts`), so the platform deploys and runs fully without any X-Ray account setup. You do **not** need to run `aws xray update-trace-segment-destination` to get a working deployment. + +If you later enable tracing, it needs a one-time per-account setup — and on a security-managed AWS Organization account an SCP may block the X-Ray service from writing to `aws/spans` entirely (the `update-trace-segment-destination` call returns `AccessDeniedException` no matter what the deploying identity does; that's an Org-level policy, not a setup mistake). Treat tracing as an opt-in extra, not a prerequisite. See the [Developer Guide](./DEVELOPER_GUIDE.md) for the enable steps. + +::: ### Amazon Bedrock before your first task @@ -479,10 +508,12 @@ Here is what the platform did after you ran `node lib/bin/bgagent.js submit`: |---|---|---| | `yarn: command not found` | Corepack not enabled or mise not activated in your shell | Run `eval "$(mise activate zsh)"`, then `corepack enable && corepack prepare yarn@1.22.22 --activate` | | `MISE_EXPERIMENTAL required` | Namespaced tasks need the experimental flag | `export MISE_EXPERIMENTAL=1` | -| `AccessDeniedException` on `update-trace-segment-destination` | Fresh account missing CloudWatch Logs resource policy for X-Ray | Run `aws logs put-resource-policy` first (see Step 3) | -| CDK deploy fails with "X-Ray Delivery Destination..." | Missing one-time account setup | Run both X-Ray commands in Step 3 | +| `exec /bin/sh: exec format error` during image build | x86 host building the arm64 (Graviton) agent image without QEMU/binfmt | Run `docker run --privileged --rm tonistiigi/binfmt --install arm64`, or build from a native arm64 host (see Step 3 caution) | +| `AccessDeniedException` on `update-trace-segment-destination` | X-Ray → CloudWatch Logs tracing is optional and the call may be SCP-blocked on Org accounts | **Skip it** — tracing is disabled by default and not required to deploy (see Step 3 note) | | `mise run build` fails with `ec2:DescribeAvailabilityZones` error | AWS credentials missing or insufficient for CDK synth | Set `AWS_PROFILE` or configure credentials with at least EC2 read access | -| CDK deploy prompts for approval and hangs | Non-interactive terminal (CI/CD, scripts) | Pass `--require-approval never` to `cdk deploy` or use an interactive terminal | +| CDK deploy prompts for approval and hangs | Non-interactive terminal (CI/CD, scripts) | Pass `--require-approval never` to `cdk deploy` (Step 3 uses it) or use an interactive terminal | +| Deploy rolled back; can't redeploy (`ROLLBACK_COMPLETE`) | A first-create failure leaves the stack un-updatable | `mise //cdk:destroy` (or delete the stack), then deploy again. Do **not** force-delete past stuck VPC resources — it orphans the VPC, and VPCs are quota-capped per Region | +| Stack stuck in `DELETE_FAILED` on a security group / subnet | AgentCore's service-managed (Hyperplane) ENIs reclaim asynchronously after the runtime is gone | Wait ~20–40 min for AWS to release the ENIs, then retry `mise //cdk:destroy`. You cannot force-detach an `amazon-aws`-owned ENI | | `put-secret-value` returns double-dot endpoint | `REGION` variable is empty | Set `REGION=us-east-1` (or your actual region) before running the command | | Model / Bedrock errors in logs (`not available on your bedrock`, zero tokens) | Model not entitled for the account or Region, wrong `modelId` shape, or missing Marketplace / FTU steps | Follow **Amazon Bedrock before your first task** above; confirm [model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html) and use an [inference profile](https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-use.html) ID such as `us.anthropic.claude-sonnet-4-6` where required; keep `grantInvoke` in `agent.ts` aligned with that model | | `REPO_NOT_ONBOARDED` on task submit | Blueprint `repo` does not match what you passed to the CLI | Confirm `BLUEPRINT_REPO`, CDK context `blueprintRepo`, or the `repo` prop on the `Blueprint` in `cdk/src/stacks/agent.ts` resolves to exactly the same `owner/repo` you pass to the CLI | diff --git a/docs/src/content/docs/getting-started/Quick-start.mdx b/docs/src/content/docs/getting-started/Quick-start.mdx index 766c8cb8..045c5a5d 100644 --- a/docs/src/content/docs/getting-started/Quick-start.mdx +++ b/docs/src/content/docs/getting-started/Quick-start.mdx @@ -16,12 +16,22 @@ Install these before you begin: - **AWS account** with credentials configured (`aws configure`). If you use named profiles, set `AWS_PROFILE` before running any commands in this guide. - **Amazon Bedrock** — The agent invokes Claude through Bedrock. IAM `grantInvoke` in the CDK stack is required but **not sufficient**: your account must also satisfy [Amazon Bedrock model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html) for the model you use (including Anthropic first-time use where applicable, Marketplace subscription flow on first serverless use, and a valid payment method for Marketplace-backed models). See **Amazon Bedrock before your first task** after Step 3. -- **Docker** - for building the agent container image +- **Docker** - for building the agent container image. On an **x86_64** host you also need QEMU/binfmt to build the arm64 (Graviton) image — see the caution in Step 3. - **Node.js** v20 or later (Node 24 is the supported maximum — see CI matrix) - **mise** - task runner ([install guide](https://mise.jdx.dev/getting-started.html)) - **AWS CDK CLI** - `npm install -g aws-cdk` (after mise is active) - **GitHub account** — You need a [GitHub profile](https://github.com/join) to fork the sample repository and create a **fine-grained personal access token (PAT)** the agent uses to push branches and open pull requests. A free github.com account is sufficient. +:::note[mise provisions Node, Yarn, and the CDK CLI for you] + +If a prerequisite check reports Node, Yarn, **and** CDK as "not found" all at once, your environment isn't broken — mise installs those tool versions in Step 1, so they only appear on `PATH` after `mise install` + shell activation. A few mise gotchas worth knowing up front: + +- **Non-interactive / spawned shells** (CI, scripts, some agent runners) don't pick up mise's shell hook, so `mise`/`node`/`yarn` may not be on `PATH`. Use the full path `~/.local/bin/mise` or prefix commands with `mise exec --`. +- **Headless Linux** (EC2, containers) often lacks a working `gpg-agent`, so `mise install` can fail Node on GPG signature verification. If so, run `mise settings set node.gpg_verify false` (downloads are still checksum-verified) and retry. +- **Two trust prompts**: you may need to `mise trust` both the project `mise.toml` *and* your user-global `~/.config/mise/config.toml` if you change a global setting. + +::: + ## Step 1 - Clone and install This project uses [mise](https://mise.jdx.dev/) to manage tool versions (Node.js, Python, security scanners) and run tasks across the monorepo. Yarn Classic handles JavaScript workspaces (`cdk/`, `cli/`, `docs/`). @@ -46,6 +56,12 @@ mise run build `mise run install` installs all JavaScript and Python dependencies across the monorepo. `mise run build` compiles the CDK app, the CLI, the agent image, and the docs site. A successful build means you are ready to deploy. +:::note[Expect noisy build output — trust the exit code] + +`mise run build` prints many `ERROR`/`WARN` lines, cdk-nag warnings, and sometimes `A worker process has failed to exit gracefully`. These are **benign** — test fixtures deliberately exercise failure paths. The only reliable success signal is the **exit code: `0` means the build passed.** Don't be alarmed by the log volume. + +::: + :::note[ABCA CLI commands in this guide] This Quick Start does **not** assume a global `bgagent` install (`npm install -g …`). Step 1 compiles the CLI into `cli/lib/`. On every **ABCA CLI** tab below, run from the `cli/` directory: @@ -161,22 +177,35 @@ Fine-grained tokens only work for repos you own (or orgs that have opted in). If The CDK stack deploys the full platform: API Gateway, Lambda functions (orchestrator, task CRUD, webhooks), DynamoDB tables, AgentCore Runtime, VPC with network isolation, Cognito user pool, and CloudWatch dashboards. ```bash -# One-time account setup: allow X-Ray to write spans to CloudWatch Logs. -# On a fresh account, X-Ray needs a resource policy before the destination can be set. -ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) -aws logs put-resource-policy \ - --policy-name xray-spans-policy \ - --policy-document "{\"Version\":\"2012-10-17\",\"Statement\":[{\"Sid\":\"XRaySpansAccess\",\"Effect\":\"Allow\",\"Principal\":{\"Service\":\"xray.amazonaws.com\"},\"Action\":[\"logs:PutLogEvents\",\"logs:CreateLogGroup\",\"logs:CreateLogStream\"],\"Resource\":[\"arn:aws:logs:*:${ACCOUNT_ID}:log-group:aws/spans\",\"arn:aws:logs:*:${ACCOUNT_ID}:log-group:aws/spans:*\"]}]}" -aws xray update-trace-segment-destination --destination CloudWatchLogs - # Bootstrap CDK (first time only) mise //cdk:bootstrap -# Deploy the stack (~10 minutes) -mise //cdk:deploy +# Deploy the stack (~10 minutes). --require-approval never lets it run +# unattended; drop the flag if you want to review IAM/security-group changes. +mise //cdk:deploy -- --require-approval never ``` -The X-Ray commands are a one-time per-account setup. On a fresh account the `put-resource-policy` call is required first — without it, the `update-trace-segment-destination` command fails with an `AccessDeniedException` because X-Ray cannot write to the `aws/spans` log group. CDK bootstrap provisions the staging resources CDK needs (S3 bucket, IAM roles). The deploy itself takes around 10 minutes - most of the time is spent building the Docker image and provisioning the AgentCore Runtime. +CDK bootstrap provisions the staging resources CDK needs (S3 bucket, IAM roles). The deploy itself takes around 10 minutes — most of the time is spent building the Docker image and provisioning the AgentCore Runtime. + +:::caution[Building on an x86 host? Register arm64 emulation first.] + +The agent runs on AgentCore (**Graviton / arm64**), so the container image is built for `linux/arm64`. On an **x86_64** build host (most cloud dev boxes, many laptops), the image build fails partway with `exec /bin/sh: exec format error` unless QEMU emulation is registered: + +```bash +docker run --privileged --rm tonistiigi/binfmt --install arm64 +``` + +If `docker run --privileged` is blocked on your host (common on security-managed machines), build from a **native arm64 host** instead (Graviton EC2 or Apple Silicon) — no emulation needed. Apple Silicon and other arm64 hosts can skip this step entirely. Note that the emulated build is noticeably slower than a native one. + +::: + +:::note[X-Ray → CloudWatch Logs tracing is optional — skip it for your first deploy.] + +The stack ships with X-Ray span export to CloudWatch Logs **disabled** (`tracingEnabled` in `cdk/src/stacks/agent.ts`), so the platform deploys and runs fully without any X-Ray account setup. You do **not** need to run `aws xray update-trace-segment-destination` to get a working deployment. + +If you later enable tracing, it needs a one-time per-account setup — and on a security-managed AWS Organization account an SCP may block the X-Ray service from writing to `aws/spans` entirely (the `update-trace-segment-destination` call returns `AccessDeniedException` no matter what the deploying identity does; that's an Org-level policy, not a setup mistake). Treat tracing as an opt-in extra, not a prerequisite. See the [Developer Guide](/developer-guide/introduction) for the enable steps. + +::: ### Amazon Bedrock before your first task @@ -479,10 +508,12 @@ Here is what the platform did after you ran `node lib/bin/bgagent.js submit`: |---|---|---| | `yarn: command not found` | Corepack not enabled or mise not activated in your shell | Run `eval "$(mise activate zsh)"`, then `corepack enable && corepack prepare yarn@1.22.22 --activate` | | `MISE_EXPERIMENTAL required` | Namespaced tasks need the experimental flag | `export MISE_EXPERIMENTAL=1` | -| `AccessDeniedException` on `update-trace-segment-destination` | Fresh account missing CloudWatch Logs resource policy for X-Ray | Run `aws logs put-resource-policy` first (see Step 3) | -| CDK deploy fails with "X-Ray Delivery Destination..." | Missing one-time account setup | Run both X-Ray commands in Step 3 | +| `exec /bin/sh: exec format error` during image build | x86 host building the arm64 (Graviton) agent image without QEMU/binfmt | Run `docker run --privileged --rm tonistiigi/binfmt --install arm64`, or build from a native arm64 host (see Step 3 caution) | +| `AccessDeniedException` on `update-trace-segment-destination` | X-Ray → CloudWatch Logs tracing is optional and the call may be SCP-blocked on Org accounts | **Skip it** — tracing is disabled by default and not required to deploy (see Step 3 note) | | `mise run build` fails with `ec2:DescribeAvailabilityZones` error | AWS credentials missing or insufficient for CDK synth | Set `AWS_PROFILE` or configure credentials with at least EC2 read access | -| CDK deploy prompts for approval and hangs | Non-interactive terminal (CI/CD, scripts) | Pass `--require-approval never` to `cdk deploy` or use an interactive terminal | +| CDK deploy prompts for approval and hangs | Non-interactive terminal (CI/CD, scripts) | Pass `--require-approval never` to `cdk deploy` (Step 3 uses it) or use an interactive terminal | +| Deploy rolled back; can't redeploy (`ROLLBACK_COMPLETE`) | A first-create failure leaves the stack un-updatable | `mise //cdk:destroy` (or delete the stack), then deploy again. Do **not** force-delete past stuck VPC resources — it orphans the VPC, and VPCs are quota-capped per Region | +| Stack stuck in `DELETE_FAILED` on a security group / subnet | AgentCore's service-managed (Hyperplane) ENIs reclaim asynchronously after the runtime is gone | Wait ~20–40 min for AWS to release the ENIs, then retry `mise //cdk:destroy`. You cannot force-detach an `amazon-aws`-owned ENI | | `put-secret-value` returns double-dot endpoint | `REGION` variable is empty | Set `REGION=us-east-1` (or your actual region) before running the command | | Model / Bedrock errors in logs (`not available on your bedrock`, zero tokens) | Model not entitled for the account or Region, wrong `modelId` shape, or missing Marketplace / FTU steps | Follow **Amazon Bedrock before your first task** above; confirm [model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html) and use an [inference profile](https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-use.html) ID such as `us.anthropic.claude-sonnet-4-6` where required; keep `grantInvoke` in `agent.ts` aligned with that model | | `REPO_NOT_ONBOARDED` on task submit | Blueprint `repo` does not match what you passed to the CLI | Confirm `BLUEPRINT_REPO`, CDK context `blueprintRepo`, or the `repo` prop on the `Blueprint` in `cdk/src/stacks/agent.ts` resolves to exactly the same `owner/repo` you pass to the CLI |