From 3f13c7fc1bb2d2b96d63b4406c0404ff49361736 Mon Sep 17 00:00:00 2001 From: petrsnd Date: Fri, 5 Jun 2026 21:46:37 -0600 Subject: [PATCH 1/5] =?UTF-8?q?docs(agent-skills):=20capture=20HTTP=20voya?= =?UTF-8?q?ge=20scars=20=E2=80=94=20SkipServerCertValidation=20reserved=20?= =?UTF-8?q?param,=20http-api=20cookie+CSRF=20auth=20shape,=20form-body=20&?= =?UTF-8?q?=20TLS=20failure=20patterns,=20omit=20-IdentityProvider=20by=20?= =?UTF-8?q?default?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .agents/skills/safeguard-ps-operations/SKILL.md | 12 ++++++++++-- .agents/skills/script-authoring/SKILL.md | 1 + docs/agent-reference/failure-patterns.md | 3 +++ docs/agent-reference/strategy-decision-tree.md | 3 ++- docs/guides/development-workflow.md | 4 ++-- docs/guides/testing-and-debugging.md | 2 +- tools/TestTool.ps1 | 3 +-- 7 files changed, 20 insertions(+), 8 deletions(-) diff --git a/.agents/skills/safeguard-ps-operations/SKILL.md b/.agents/skills/safeguard-ps-operations/SKILL.md index 31139a3..04f1569 100644 --- a/.agents/skills/safeguard-ps-operations/SKILL.md +++ b/.agents/skills/safeguard-ps-operations/SKILL.md @@ -106,6 +106,8 @@ Connect-Safeguard -Appliance -Insecure -DeviceCode Do not pre-ask whether the appliance has a valid certificate. Try secure; the error message tells both the operator and the agent unambiguously when `-Insecure` is needed. +**Omit `-IdentityProvider` by default.** `Connect-Safeguard` with no `-IdentityProvider` lets the appliance discover the user's provider from their username at PKCE time, which works for local, certificate, and every external provider (AD, LDAP, OIDC, SAML) the appliance has configured. Hardcoding `-IdentityProvider local` breaks login for any user not in the local IdP, which is the common case in production. Pass `-IdentityProvider` only when the operator has explicitly named the provider, or when a prior login failed with a provider-mismatch error. + ### Persist the session across iterations — serialize the token, never keep a long-running shell **Login budget = 1 per voyage.** Each `Connect-Safeguard -DeviceCode` (or `-Browser`) costs the operator real time and attention. Connect exactly once. @@ -114,15 +116,21 @@ Do not pre-ask whether the appliance has a valid certificate. Try secure; the er The only correct shape is short-lived sync `powershell -Command { ... }` calls. `$Global:SafeguardSession` holds **a short-lived bearer token, not a permanent credential** — valid for the rest of the voyage (typically several hours), safe to serialize to the gitignored per-session state directory, expires on its own. -**Step 1 — connect once and serialize.** Sync call; the shell exits when `Connect-Safeguard` returns (no `-NoExit`, no async): +**Step 1 — connect once and serialize, in the same process.** The connect call and the serialization step **must run in the same PowerShell process**: `$Global:SafeguardSession` dies with the process that called `Connect-Safeguard`, so a "connect in shell A, serialize in shell B" split silently throws the token away and burns a login. + +Do **not** pipe the cmdlet to `Out-Null` — the verification URL and short code are printed on its host stream as it waits for the device-code callback, and `Out-Null` (or any output redirection that suppresses host writes) hides them from the operator. The cmdlet's return value is the session object, which the agent does not need because `$Global:SafeguardSession` is the durable artifact; just let stdout pass through. + +In an agent runtime where sync stdout buffers until a cmdlet returns (and so hides the device-code block until after it has expired), run the connect + serialize sequence as one short-lived async invocation and read its stream as the device-code block lands. The operator-surfacing rule still applies: the agent must echo the URL, the code, and the 300-second expiry to the operator the moment they appear. ``` -Connect-Safeguard -Appliance -Insecure -DeviceCode | Out-Null +Connect-Safeguard -Appliance -Insecure -DeviceCode $Global:SafeguardSession | ConvertTo-Json -Depth 5 | Set-Content "$env:USERPROFILE\.copilot\session-state\\files\sg-session.json" -Encoding utf8 ``` +If `Connect-Safeguard` is called on its own without the serialize step in the same process, the token is lost and the next cmdlet call will prompt for `-Appliance`, hanging the agent and forcing a second login. That is a defect (see "Why explicit threading…" below), not a recoverable state. + **Step 2 — every subsequent cmdlet is its own fresh sync call.** Re-hydrate the saved session, normalize `Insecure` to a `[bool]` once, then thread `Appliance`, `AccessToken`, and the bool through every call: ``` diff --git a/.agents/skills/script-authoring/SKILL.md b/.agents/skills/script-authoring/SKILL.md index 263cf6d..ee416ae 100644 --- a/.agents/skills/script-authoring/SKILL.md +++ b/.agents/skills/script-authoring/SKILL.md @@ -68,6 +68,7 @@ Before declaring a draft "ready to import," cross-reference an analogous sample - **Top-level shape.** `Id`, `BackEnd: "Scriptable"`, optional `Meta`, optional `Imports`, then one object per operation (`CheckSystem`, `CheckPassword`, `ChangePassword`, …). Operation objects contain `Parameters` (array of single-key objects) and `Do` (array of command objects). See [`schema/custom-platform-script.schema.json`](../../../schema/custom-platform-script.schema.json) lines 14–80 for the top-level fields, and [`docs/reference/script-structure.md`](../../../docs/reference/script-structure.md) for prose. - **Reserved parameters** are not declared by the script — SPP injects them. Custom parameters are declared in `Parameters` and addressed as `%Name%`. See [`docs/reference/reserved-parameters.md`](../../../docs/reference/reserved-parameters.md) and [`docs/reference/custom-parameters.md`](../../../docs/reference/custom-parameters.md). +- **Before declaring a custom parameter, grep [`docs/reference/reserved-parameters.md`](../../../docs/reference/reserved-parameters.md) for the concept.** Reserved names like `SkipServerCertValidation`, `UseSsl`, `CheckHostKey`, `HostKey`, `HttpProxyUri`/`Port`/`UserName`/`Password`, and `TacacsSecret` are **auto-sourced from the asset's settings** — declare them in the operation's `Parameters` array exactly like a custom param and SPP populates them at runtime with zero `-CustomScriptParameters` plumbing at onboarding. Inventing a custom name for any of these concepts (e.g. declaring your own `IgnoreCert` Boolean) works in isolation but forces every onboarding operator to remember the override forever. Always prefer the reserved name. - **Secrets.** Any parameter that holds a credential MUST be `Type: "Secret"` so SPP redacts it in task logs (see the redaction note in [`safeguard-ps-operations`](../safeguard-ps-operations/SKILL.md) and [`tools/README.md`](../../../tools/README.md), "Secret handling"). Use the `::$` modifier (`%FuncPassword::$%`) where the templates and samples do; do not invent a different escape. - **`Try` / `Catch`.** Wrap fallible operations (network calls, command execution, parses) so a clean `Disconnect` still runs and a structured `Return`/`Throw` is produced. Both [`templates/TemplateSshMinimal.json`](../../../templates/TemplateSshMinimal.json) and [`templates/TemplateHttpMinimal.json`](../../../templates/TemplateHttpMinimal.json) demonstrate this shape end-to-end. - **Return values.** End each operation with `Return` (typically `%CheckResult%` or a discovery payload). Never let an operation fall off the end of `Do` without a return. diff --git a/docs/agent-reference/failure-patterns.md b/docs/agent-reference/failure-patterns.md index f919b23..2200f57 100644 --- a/docs/agent-reference/failure-patterns.md +++ b/docs/agent-reference/failure-patterns.md @@ -20,6 +20,7 @@ Each row is grounded in a real `Test-SafeguardCustomPlatformScript` response cap | --- | --- | --- | | `Function 'X' expects N parameters, but is being called with M` | Caller is passing the wrong number of positional args to an imported (or local) function. The public docs at `docs/reference/imports.md` deliberately do not list signatures because the deployed appliance's view can drift from any external reference. | Grep `samples/` for `"Name": "X"` and copy the `Parameters` array from a working call site that imports the same library. If no sample exercises the call, attempt the call with a guess and read the appliance's `expects N` error literally — the appliance is authoritative for its own deployed signature. Order in the array is positional; calls do not name their parameters. | | `60020: Invalid custom platform script. Reason: Platform definition is not valid JSON: Specified argument was out of the range of valid values.` (returned by `Test-SafeguardCustomPlatformScript` and `Import-SafeguardCustomPlatformScript` when the script body parses cleanly as JSON locally) | The appliance's script deserializer rejects a top-level `"$schema"` field even though the JSON Schema at `schema/custom-platform-script.schema.json` lists `$schema` as a permitted property. Adding `"$schema": "..."` for editor IntelliSense breaks server-side validation; local schema validation does not catch it because the field is schema-legal. | Remove the top-level `"$schema"` property before invoking `Test-` or `Import-`. Keep IntelliSense via the editor's JSON-schema mapping (`json.schemas` in VS Code settings, or a sibling `*.schema.json` reference) instead of an inline declaration. | +| `60020: Invalid custom platform script. Reason: ... The variable "Uri" is used at path "...SetItem.Value" but it is not declared in that scope` (returned by `Test-SafeguardCustomPlatformScript` for any script that uses `Uri.EscapeDataString(...)` — or any other bare `Type.Method(...)` call — inside a `%{ ... }%` expression, e.g. inside `SetItem.Value`) | The script-engine expression evaluator does not have an implicit binding for `System.Uri` (or any other base class library type referenced by bare name). Bare `Identifier.Method(...)` parses as `.Method(...)`; with no `Uri` variable in scope, the validator rejects the script. Calls like `.Method(...)` work because the receiver resolves to a real variable. Several on-disk templates — notably [`templates/Pattern-GenericRestApiBearerToken.json`](../../templates/Pattern-GenericRestApiBearerToken.json) — use `Uri.EscapeDataString` and therefore fail server validation despite passing local schema validation. | For form bodies, build a form via repeated `SetFormValue` (form is created automatically on first use, default `CreateForm: "CreateIfNotFound"`) and reference it from `Request.Content.ContentObjectName` with `ContentType: "application/x-www-form-urlencoded"` — the engine handles encoding. Cross-reference [`samples/http/twitter/CustomTwitter.json`](../../samples/http/twitter/CustomTwitter.json) lines 133–148. For URL path components, use the registered `UrlEncode` command into a result variable, then interpolate the variable into the URL via plain string substitution. Do **not** use bare-type method calls inside `%{ ... }%` expressions. | ## Asset-onboarding errors @@ -50,6 +51,8 @@ Each row is grounded in a real cmdlet response captured during authoring or onbo | `Sorry, try again.\r\nsudo: no password was provided\r\nsudo: 2 incorrect password attempts` (in a `Receive` buffer after a `(printf ... ) \| sudo -S ...` pipeline; the `Send` completed cleanly and `Connect` succeeded) | operation | Piping a password through a bash one-liner into `sudo -S` is brittle inside a PTY-allocated shell: the parent shell may strip or echo the password line in ways that defeat sudo's stdin read, and the only diagnostic ever printed is a generic "Sorry, try again". The pattern is also fragile to passwords containing shell metacharacters. The proven SSH password-rotation pattern in the repo never pipes the password — it sends `sudo passwd ` as a normal command, then walks through sudo's password prompt, the new-password prompt, and the retype-new-password prompt **as separate `Send`/`Receive` pairs**, with the password buffers marked `"ContainsSecret": true`. | Replace any `printf ... \| sudo -S ...` construct with the prompt-driven pattern from [`samples/ssh/generic-linux/GenericLinux.json`](../../samples/ssh/generic-linux/GenericLinux.json) lines 281–340 (`ChangeUserPassword`): `Send "sudo passwd ; echo CHGPASS=$?\n"`, then `Receive` the sudo prompt, `Send` `%FuncPassword%` (`ContainsSecret: true`, no surrounding quotes, no trailing `\n` — the appliance terminates secret sends itself), `Receive` `[Nn]ew.*[Pp]assword:`, `Send` `%NewPassword%`, `Receive` retype/new prompt, `Send` `%NewPassword%` again, `Receive` `CHGPASS=[0-9]+`. Capture the final buffer with `WriteResponseObject` so the task log shows the success/failure marker on the very next iteration. | | `Error in component 'SetItem': invalid expression: Ambiguous match found for: 'Split'` (raised by Z.Expressions when evaluating any `%{ ... .Split(':') ... }%` interpolation) | parse | The script-engine's expression evaluator (Z.Expressions) is bound against a modern .NET base class library where `string.Split(...)` has multiple overloads (`Split(char)`, `Split(char[])`, `Split(string)`, etc.). When the operand is a single-character literal like `':'`, the evaluator cannot pick between `Split(char)` and `Split(char[])` and throws `Ambiguous match found for: 'Split'`. The result is that **any** ssh-batch script using `someString.Split(':')[N]` to slice a colon-delimited shadow line, `passwd` entry, or fstab field will fail at the first `SetItem` that interpolates it — even though the same expression is legal C# in isolation. | **First ask whether you need to split at all.** If the value is going into `CompareShadowHash.SaltedHash`, do **not** pre-split — the component handler splits the shadow line on `:` internally (verified in Hercules `Source/Hercules.WebService/Common/Crypt/PasswordHash.cs` `CheckPasswordAgainstShadowEntry`, and demonstrated by [`samples/ssh/generic-linux/GenericLinux.json`](../../samples/ssh/generic-linux/GenericLinux.json) line 236 which passes `%AccountEntry%` whole). When a split is genuinely required, replace `.Split('')[]` with a `Regex.Match`-based extraction: `%{ Regex.Match(, "^[^:]*:([^:]*):").Groups[1].Value }%` for the second colon-delimited field, parameterizing the pattern for the field index needed. `Regex.Match` has a single overload at the appliance and is unambiguous. The same applies to any other ambiguous `string` method (`Replace`, `IndexOf`) when called with a single-character literal. | | Operation returns a clean verdict (e.g., `CheckResult: false`, `PasswordMismatch`) against a yescrypt-format `/etc/shadow` entry (`$y$j9T$...`) **after** a `Try`/`Catch` fired earlier in the operation. The catch's exception text — typically a Z.Expressions `Ambiguous match found for: 'Split'` from a `SetItem` that pre-extracted a hash field — is logged but the verdict surfaces unannotated. | operation (misdiagnosed) | The verdict is the **catch's fallback value, not the target's answer**. `CompareShadowHash` supports yescrypt (`$y$` prefix) via the `Yescrypt.IsYescrypt`/`Yescrypt.CheckPassword` branch in `PasswordHash.CheckPasswordAgainstHash`; the upstream `LinuxSshFunctions.json` import uses it on yescrypt-default Ubuntu/Debian systems. The actual root cause is a script-side bug in the path that runs **before** `CompareShadowHash` — most often pre-splitting the shadow line in a `SetItem` expression that hits the Split-overload-ambiguity row above, then a `Try`/`Catch` falls back to `auth-by-login` or returns a sentinel mismatch. Without reading the caught exception, the verdict looks like a target-side hash incompatibility. | Read the **caught exception text** from the operation log before drawing target-side conclusions — `script-authoring`'s "Catch blocks must log before falling back" rule applies. Then fix the real bug: pass `%AccountEntry%` (the whole shadow line, captured via `getent shadow `) directly to `CompareShadowHash.SaltedHash`. Do not pre-split. Mirror [`samples/ssh/generic-linux/GenericLinux.json`](../../samples/ssh/generic-linux/GenericLinux.json) lines 220–245: capture, compare whole, condition on `PasswordHashMatched`, and only fall back to a login-as-test pattern in a `Catch` block (and only when `getent` is genuinely unavailable, e.g., locked-down sudo). yescrypt is not the problem. | +| `HTTP Request threw an exception ... AuthenticationException: The remote certificate is invalid according to the validation procedure: RemoteCertificateNameMismatch, RemoteCertificateChainErrors` (on the very first `Request` component log line `Sending request [...] [Skip SSL Validation=False]`; subsequent `Block returned error state` followed by the operation's `Catch` returning a sentinel like `false`, which SPP surfaces as `PasswordMismatch` even though the script never reached the auth step) | connect | The `Request` component does not skip server certificate validation by default — its log emits `[Skip SSL Validation=False]` when the flag is unset. Targets with self-signed or hostname-mismatched certificates fail the TLS handshake before any HTTP traffic. If the operation's outer `Try` catches all exceptions and returns `false`, SPP sees a clean `CheckPassword` verdict of "mismatch" with no auth attempt — the cert error is buried mid-log. | Declare the **reserved parameter** `SkipServerCertValidation` (`Type: Boolean`, `DefaultValue: false`) on every operation and on any function that issues a `Request`, then set `"IgnoreServerCertAuthentication": "%{SkipServerCertValidation}%"` on every `Request` block — including token-refresh/login calls inside helper functions. SPP auto-sources the parameter from the asset's `VerifySslCertificate` flag (asset-level toggle via `Edit-SafeguardAsset -VerifyServerSslCertificate $false` for self-signed labs), so no `-CustomScriptParameters` plumbing is needed at onboarding. See [`docs/reference/reserved-parameters.md`](../reference/reserved-parameters.md) line 128 and [`docs/guides/http-platforms.md`](../guides/http-platforms.md) lines 638–651. **Do not invent a custom parameter name** — reserved names get the auto-population for free; custom names require the operator to remember `-CustomScriptParameters` on every asset. Also audit `Catch` blocks: a `Catch` that swallows all exceptions and returns a domain sentinel (`false` for `CheckPassword`, `true` for `CheckSystem`) hides connect-phase failures as operation-phase verdicts — the `script-authoring` "Catch blocks must log before falling back" rule applies. | +| `Response status: BadRequest` against a target that accepts the **identical** body when sent manually with `Invoke-RestMethod` or `curl`; the log shows `Sending request [...] [Skip SSL Validation=True]` followed by a 400 response from an endpoint that expects form fields containing `@` or `%`. The `Request` block uses `Content: { "Value": "%PreBuiltBody%", "ContentType": "application/x-www-form-urlencoded" }` where `%PreBuiltBody%` is a string built by `SetItem` with `UrlEncode`-ed fields concatenated by hand. | parse | `Request.Content.Value` is **not a documented field** ([`docs/reference/commands/request.md`](../reference/commands/request.md) lines 51–55 only define `Content.ContentObjectName` and `Content.ContentType`; line 158 explicitly states "`ContentType` is only applied when `ContentObjectName` is present"). When a script supplies `Content.Value` with form ContentType, the engine's behavior is undefined — observed cases include re-encoding `%40` to `%2540` (double-encoding) and silently dropping the body. The target then sees corrupted or absent form fields and returns 400 "Parameter verification failed" or similar — fooling diagnosis into thinking the credential is wrong. | Build a form object with one `SetFormValue` per field (the form is created on first use; default `CreateForm: "CreateIfNotFound"` is correct), then reference it from `Request.Content.ContentObjectName`. The engine URL-encodes each field exactly once. Mirror [`samples/http/twitter/CustomTwitter.json`](../../samples/http/twitter/CustomTwitter.json) lines 133–148 (CheckPassword) and 225–242 (ChangePassword). Do **not** pre-`UrlEncode` field values when feeding them through `SetFormValue` — the component encodes during serialization. Reserve `UrlEncode` + `SetItem` for URL **path** components substituted into `Request.Url` (where `Request.Content` is not involved). |