Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
197 changes: 105 additions & 92 deletions docs/abca-plugin/skills/onboard-repo/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,139 +1,152 @@
---
name: onboard-repo
description: >-
Onboard a new GitHub repository to the ABCA platform by adding a Blueprint CDK
construct. Use when the user says "onboard a repo", "add a repository",
"register a repo", "new repo", "Blueprint construct", "REPO_NOT_ONBOARDED error",
or gets a 422 error about an unregistered repository.
Onboard a new GitHub repository to the ABCA platform so the agent can target it.
Use when the user says "onboard a repo", "add a repository", "register a repo",
"new repo", or gets a `REPO_NOT_ONBOARDED` / 422 error about an unregistered
repository.
---

# Repository Onboarding

You are guiding the user through onboarding a new GitHub repository to ABCA. Repositories must be registered as `Blueprint` constructs in the CDK stack before tasks can target them.
You are helping an **operator** register a GitHub repository with their running ABCA
deployment so tasks can target it.

## Step 1: Gather Repository Details
There are two paths.

Use AskUserQuestion to collect:
- **Repository**: GitHub `owner/repo` format
- **Compute type**: `agentcore` (default) or `ecs`
- **Model preference**: Claude Sonnet 4 (default), Claude Opus 4 (complex repos), or Claude Haiku (lightweight). **Important:** Models must be specified using their cross-region inference profile ID (e.g. `us.anthropic.claude-opus-4-20250514-v1:0`), not the raw foundation model ID. On-demand invocation of raw model IDs is not supported for most models.
- **Max turns**: Default 100 (range: 1-500)
- **Max budget**: USD cost ceiling per task (optional)
- **Custom GitHub PAT**: If this repo needs a different token than the platform default
**Prefer the CLI operator path (Path A)** when the repo can run on the
**platform/default-blueprint** setup — the default GitHub token secret, a model
already granted to the runtime, and the default egress allowlist. It's a single
runtime command against the deployed stack: no code change, no redeploy.

## Step 2: Read the Current Stack
**Use the CDK Blueprint path (Path B)** when the repo needs its **own** config that
the CLI can't provision at runtime — a per-repo GitHub token, a model not yet
granted to the runtime, custom egress domains, Cedar HITL policies, or
system-prompt overrides. These are baked into infrastructure and require a redeploy
(with the correct permissions). When in doubt, start with Path A; if a task later
fails on a missing token / model grant / blocked egress, promote the repo to a
Blueprint.

Read the CDK stack file to understand existing Blueprint definitions:
> **This is an operation, not a contribution.** Onboarding a repo into your own
> deployment writes a record to the platform's RepoTable — it is **not** a change to
> the `aws-samples` codebase, so the ADR-003 contribution flow (GitHub issue →
> approval → feature branch) does **not** apply. Only invoke ADR-003 if the user is
> actually changing the platform source (e.g. wiring a brand-new Bedrock model into
> the stack — see "Model not yet wired into the runtime" below).

## Path A — CLI operator onboarding (default)

`bgagent repo onboard` writes (or re-activates) the repository's `RepoConfig` row in
the deployed RepoTable directly. It takes effect immediately — **no `agent.ts` edit,
no `cdk deploy`.**

```bash
bgagent repo onboard <owner/repo>
# common overrides:
# --model <inference-profile-id> e.g. us.anthropic.claude-sonnet-4-6
# --compute-type <agentcore|ecs>
# --max-turns <n>
# --token-secret-arn <arn> per-repo GitHub token (else platform default)
```
Read cdk/src/stacks/agent.ts

Then confirm it landed:

```bash
bgagent repo list # status should be "active"
bgagent repo show <owner/repo> # full resolved config (secret ARNs redacted)
```

Identify:
- Where existing Blueprint constructs are defined
- The `repoTable` reference used
- Any patterns for compute/model overrides
That's it — the repo is onboarded. Submit a task with the `submit-task` skill.

### Sample blueprint repo without a code change
**Pick a model that is already wired into the runtime.** With no `--model`, the repo
uses the platform default (Sonnet 4.6). If you pass `--model`, use a cross-Region
**inference profile ID** (e.g. `us.anthropic.claude-sonnet-4-6`), not a raw
`anthropic.*` foundation-model ID. Only models the stack has granted the runtime can
be invoked — see "Model not yet wired into the runtime" before choosing a model the
deployment doesn't already support.

The stack’s **AgentPlugins** blueprint uses a `repo` value resolved in this order: **`BLUEPRINT_REPO`** (environment variable) → CDK context **`blueprintRepo`** → default `awslabs/agent-plugins` (see `blueprintRepo` in `cdk/src/stacks/agent.ts`). If the user only needs to target a fork of the sample repo, they can set `export BLUEPRINT_REPO=owner/repo` or pass `-c blueprintRepo=owner/repo` (or set `"context": { "blueprintRepo": "..." }` in `cdk/cdk.json`) and redeploy, instead of adding a new `Blueprint` construct.
## Path B — CDK Blueprint (declarative / canonical)

## Step 3: Add the Blueprint Construct
Use this when the operator wants the repo committed to infrastructure-as-code (so a
fresh deploy re-creates it) rather than set as a runtime record. This **does** require
editing the stack and redeploying.

Add a new `Blueprint` construct instance to the stack. Follow the existing pattern. Example:
1. Read `cdk/src/stacks/agent.ts` to find where `Blueprint` constructs are defined and
the `repoTable` reference.
2. Add a construct following the existing pattern:

```typescript
new Blueprint(this, 'MyRepoBlueprint', {
repo: 'owner/repo',
repoTable: repoTable.table,
// Optional overrides:
// computeType: 'agentcore',
// modelId: 'us.anthropic.claude-sonnet-4-20250514-v1:0',
// maxTurns: 100,
// maxBudgetUsd: 50,
// runtimeArn: runtime.runtimeArn,
// githubTokenSecretArn: 'arn:aws:secretsmanager:...',
});
```
```typescript
new Blueprint(this, 'MyRepoBlueprint', {
repo: 'owner/repo',
repoTable: repoTable.table,
// Optional overrides:
// computeType: 'agentcore',
// modelId: 'us.anthropic.claude-sonnet-4-6',
// maxTurns: 100,
// maxBudgetUsd: 50,
// githubTokenSecretArn: 'arn:aws:secretsmanager:...',
});
```

Use a descriptive construct ID derived from the repo name.
3. Redeploy: `mise //cdk:compile` → `mise //cdk:diff` (show the diff) → `mise //cdk:deploy -- --require-approval never`.

### Model ID and IAM Permissions
> **Sample-repo shortcut:** the stack's AgentPlugins blueprint resolves its `repo` from
> `BLUEPRINT_REPO` (env) → CDK context `blueprintRepo` → default `awslabs/agent-plugins`.
> To target a fork of the sample without adding a construct, set
> `export BLUEPRINT_REPO=owner/repo` (or `cdk.json` context) and redeploy.

When specifying a non-default model via `agent.modelId`, three things are required:
## Model not yet wired into the runtime (the one real code change)

1. **Use the inference profile ID, not the raw model ID, when Bedrock requires it.** For `InvokeModel` / streaming, specify the cross-Region **inference profile** identifier (or ARN) where the Bedrock User Guide calls for it — not only the bare `anthropic.*` foundation model ID. Examples:
- Sonnet 4.6 (US geography profile): `us.anthropic.claude-sonnet-4-6`
- Sonnet 4: `us.anthropic.claude-sonnet-4-20250514-v1:0`
- Opus 4: `us.anthropic.claude-opus-4-20250514-v1:0`
- Haiku 4.5: `us.anthropic.claude-haiku-4-5-20251001-v1:0`
A repo can only use a model the **runtime IAM role has `grantInvoke` for**. As of now
the stack wires **Sonnet 4.6, Opus 4 (`claude-opus-4-20250514`), and Haiku 4.5** (see
the `grantInvoke` block in `agent.ts`). Onboarding a repo pinned to any **other** model
(e.g. Opus 4.8 / `us.anthropic.claude-opus-4-8`) will fail at invoke with a 403 — the
CLI onboard succeeds, but tasks can't run.

See [Use an inference profile in model invocation](https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-use.html).
Adding a new model **is** a platform source change, so it follows ADR-003 (issue →
approval → feature branch) and requires:

2. **Grant the runtime IAM permissions for the model.** The Blueprint construct does not automatically grant `bedrock:InvokeModel*` — this is by design (least privilege). You must add a `grantInvoke` block in the stack for each model used:
1. **Wire the model + inference profile and grant the runtime**, in `agent.ts`:
```typescript
const opusModel = new bedrock.BedrockFoundationModel('anthropic.claude-opus-4-20250514-v1:0', {
const model = new bedrock.BedrockFoundationModel('anthropic.claude-opus-4-8', {
supportsAgents: true,
supportsCrossRegion: true,
});
opusModel.grantInvoke(runtime);

const opusProfile = bedrock.CrossRegionInferenceProfile.fromConfig({
model.grantInvoke(runtime);
const profile = bedrock.CrossRegionInferenceProfile.fromConfig({
geoRegion: bedrock.CrossRegionInferenceProfileRegion.US,
model: opusModel,
model,
});
opusProfile.grantInvoke(runtime);
profile.grantInvoke(runtime);
```
then redeploy.
2. **Account-level Bedrock model access** (separate from IAM): the account must have the
model enabled for the Region — complete [model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html)
prerequisites (Marketplace actions / Anthropic first-time use where applicable). For
cross-Region profiles, IAM and SCPs must allow Bedrock in source **and** destination
Regions.

3. **Account-level Bedrock model access (separate from IAM).** The runtime role must be allowed to invoke the model, and the **AWS account** must be able to use that model in Bedrock: complete [model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html) prerequisites (AWS Marketplace actions on first serverless use where applicable, Anthropic first-time use / `PutUseCaseForModelAccess` for Anthropic models, valid payment method for Marketplace-backed models). For geographic cross-Region inference profiles, IAM and SCPs must allow Bedrock in **source and destination** Regions per [Supported Regions and models for inference profiles](https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-support.html).

## Step 4: Deploy

After adding the Blueprint, the stack must be redeployed:

```bash
export MISE_EXPERIMENTAL=1
mise //cdk:compile # Verify TypeScript compiles
mise //cdk:test # Run tests
mise //cdk:diff # Preview changes
```

Show the diff to the user. If it looks correct, ask if they want to deploy now.

```bash
mise //cdk:deploy
```

## Step 5: Verify

After deployment, verify the repo config was written to DynamoDB:

```bash
aws dynamodb scan --table-name <RepoTableName> \
--filter-expression "repo = :r" \
--expression-attribute-values '{":r":{"S":"owner/repo"}}' \
--output json
```
If the user just wants the agent working now, steer them to a wired model (Sonnet 4.6)
via Path A and treat "add model X" as a separate, later change.

## Per-Repository Configuration Reference
## Per-repository configuration reference

| Setting | Purpose | Default |
|---------|---------|---------|
| `compute_type` | Execution strategy | `agentcore` |
| `runtime_arn` | AgentCore runtime override | Platform default |
| `model_id` | AI model for tasks | Platform default (Sonnet 4) |
| `model_id` | AI model for tasks (inference profile ID) | Platform default (Sonnet 4.6) |
| `max_turns` | Turn limit per task | 100 |
| `max_budget_usd` | Cost ceiling per task | Unlimited |
| `system_prompt_overrides` | Custom system instructions | None |
| `github_token_secret_arn` | Repo-specific GitHub token | Platform default |
| `poll_interval_ms` | Completion polling frequency | 30000ms |

Task-level parameters override Blueprint defaults. If neither specifies a value, platform defaults apply.
Task-level parameters override per-repo defaults; if neither specifies a value, platform defaults apply.

## Common Issues
## Common issues

- **422 "Repository not onboarded"** — Blueprint hasn't been deployed yet. Add the construct and redeploy.
- **Preflight failures after onboarding** — GitHub PAT may lack permissions for the new repo. Check the PAT's fine-grained access includes the target repository with Contents (read/write) and Pull requests (read/write) permissions.
- **400 "Invocation with on-demand throughput isn't supported"** — The Blueprint `modelId` is using a raw foundation model ID instead of an inference profile ID. Change e.g. `anthropic.claude-opus-4-20250514-v1:0` to `us.anthropic.claude-opus-4-20250514-v1:0`.
- **403 "not authorized to perform bedrock:InvokeModelWithResponseStream"** — The runtime IAM role lacks permissions for the model specified in the Blueprint. Add `grantInvoke` for both the model and its cross-region inference profile in `agent.ts`.
- **Model not available / "not available on your Bedrock deployment"** — IAM is not the whole story: the account must meet [Bedrock model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html) for that model family and Region, and `modelId` should be an **enabled** inference profile ID (for example `us.anthropic.claude-sonnet-4-6`) where Bedrock requires it. After fixing access in the console, align Blueprint / DynamoDB `model_id` and redeploy if you change IAM grants.
- **`REPO_NOT_ONBOARDED` / 422** — the repo isn't registered. Run `bgagent repo onboard <owner/repo>` (Path A). Confirm the `owner/repo` matches exactly what you pass to `bgagent submit --repo`.
- **Preflight failure after onboarding** — the GitHub PAT lacks access to the new repo. Ensure the token has Contents (read/write) + Pull requests (read/write) on it, or onboard with a repo-specific `--token-secret-arn`.
- **400 "Invocation with on-demand throughput isn't supported"** — `model_id` is a raw foundation-model ID; use the inference-profile ID (e.g. `us.anthropic.claude-sonnet-4-6`).
- **403 "not authorized to perform bedrock:InvokeModelWithResponseStream"** — the repo's model isn't wired into the runtime. See "Model not yet wired into the runtime."
- **Model not available / "not available on your Bedrock deployment"** — account-level Bedrock access isn't enabled for that model/Region (separate from IAM); complete [model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html), then use an enabled inference-profile ID.
25 changes: 15 additions & 10 deletions docs/abca-plugin/skills/setup/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,28 +48,33 @@ Run these steps in order, verifying each:
6. `mise run install` — Install all workspace dependencies
7. `mise run build` — Full monorepo build (agent quality + CDK + CLI + docs)

If `mise run install` fails with "yarn: command not found", Corepack wasn't activated. If `prek install` fails about `core.hooksPath`, another hook manager owns hooks — suggest `git config --unset-all core.hooksPath`.
Common Phase 2 snags to pre-empt (don't let these read as a broken environment):
- "yarn: command not found" → Corepack wasn't activated (step 3).
- `prek install` fails about `core.hooksPath` → another hook manager owns hooks; suggest `git config --unset-all core.hooksPath`.
- Node, Yarn, AND CDK all "not found" at once → expected before `mise install` finishes; mise provisions them.
- `mise install` fails Node on GPG verification (headless/EC2, no gpg-agent) → `mise settings set node.gpg_verify false` (still checksum-verified), retry.
- "config not trusted" for `~/.config/mise/config.toml` → run `mise trust` on the user-global config too, not just the project one.
- In a non-interactive/spawned shell, `mise` may not be on `PATH` → use `~/.local/bin/mise` or `mise exec --`.

## Phase 3: One-Time AWS Setup
## Phase 3: One-Time Host Setup (build architecture)

On a fresh AWS account, X-Ray needs a CloudWatch Logs resource policy before it can write spans. Run both commands — the first creates the policy, the second sets the destination:
The agent image is built for **linux/arm64** (AgentCore runs on Graviton). On an **x86_64** build host this is the most common first-deploy blocker — the image build dies with `exec /bin/sh: exec format error`. Register QEMU emulation once per host:

```bash
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
aws logs put-resource-policy \
--policy-name xray-spans-policy \
--policy-document "{\"Version\":\"2012-10-17\",\"Statement\":[{\"Sid\":\"XRaySpansAccess\",\"Effect\":\"Allow\",\"Principal\":{\"Service\":\"xray.amazonaws.com\"},\"Action\":[\"logs:PutLogEvents\",\"logs:CreateLogGroup\",\"logs:CreateLogStream\"],\"Resource\":[\"arn:aws:logs:*:${ACCOUNT_ID}:log-group:aws/spans\",\"arn:aws:logs:*:${ACCOUNT_ID}:log-group:aws/spans:*\"]}]}"
aws xray update-trace-segment-destination --destination CloudWatchLogs
docker run --privileged --rm tonistiigi/binfmt --install arm64
```

These must be run once per AWS account before first deployment. If the `put-resource-policy` step is skipped, the `update-trace-segment-destination` command fails with `AccessDeniedException`.
If `docker run --privileged` is blocked (security-managed hosts), deploy from a **native arm64 host** (Graviton EC2 / Apple Silicon) instead. On Apple Silicon / arm64 hosts, skip this phase.

**X-Ray tracing is OPTIONAL — do not gate deployment on it.** The stack ships with X-Ray→CloudWatch-Logs export disabled (`tracingEnabled` in `agent.ts`), so it deploys and runs fully without any X-Ray setup. Do NOT run `aws xray update-trace-segment-destination` as a prerequisite — on a security-managed AWS Org account an SCP can make that call fail with `AccessDeniedException` no matter what, dead-ending the user on a step the platform doesn't use. Mention tracing only as an opt-in extra.

## Phase 4: First Deployment

Guide through:

1. `mise //cdk:bootstrap` — Bootstrap CDK (if not already done for this account/region)
2. `mise //cdk:deploy` — Deploy the stack (~9.5 minutes)
2. `mise //cdk:deploy -- --require-approval never` — Deploy the stack (~9.5 minutes). The flag avoids the approval prompt hanging in a non-interactive shell.
- If the deploy rolls back on a missing IAM permission and lands in `ROLLBACK_COMPLETE`, the stack can't be updated — `mise //cdk:destroy` then redeploy. Teardown can stall in `DELETE_FAILED` for ~20–40 min while AgentCore's service-managed (Hyperplane) ENIs are reclaimed; wait, then retry destroy. Never force-delete past stuck VPC resources (orphans the VPC; VPCs are quota-capped per Region).
3. Retrieve stack outputs:
```bash
aws cloudformation describe-stacks --stack-name backgroundagent-dev \
Expand Down
Loading
Loading