docs(lambda): add docs/deploy/aws-lambda.mdx deployment guide#914
docs(lambda): add docs/deploy/aws-lambda.mdx deployment guide#914jrusso1020 wants to merge 9 commits into
Conversation
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
51a1c41 to
c8a3a10
Compare
63a62c1 to
1265426
Compare
c8a3a10 to
c0895ef
Compare
1265426 to
9c7e205
Compare
miguel-heygen
left a comment
There was a problem hiding this comment.
CI green (regression, player-perf, preview-regression all pass; Graphite pending is expected for a stacked PR).
Content looks accurate. A few specific things I verified:
Architecture diagram — the dispatch model (Plan/RenderChunk/Assemble → one Lambda handler → S3) matches the handler.mjs structure described elsewhere in the stack.
Three deployment paths — CLI → SAM → CDK is a clean progression; the CDK construct exposing .bucket, .renderFunction, .stateMachine is consistent with the described CloudFormation outputs (RenderBucketName, RenderStateMachineArn, RenderFunctionArn).
IAM section — policies user / policies role / policies validate subcommands are well-documented with the Resource: "*" narrowing note; the CI gate pattern for validate is a good call.
Cost accounting — the $0.0214 example and the pointer to costAccounting.ts for auditability is correct in principle. One minor nit: the cost line says "S3 transfer is not included" but doesn't mention S3 GET/PUT request costs either — worth a one-liner so adopters don't expect the number to be complete.
Troubleshooting section — PLAN_HASH_MISMATCH, BROWSER_GPU_NOT_SOFTWARE, stuck-at-RUNNING, and S3 Retain bucket are all realistic failure modes with actionable guidance. FONT_FETCH_FAILED / FFMPEG_VERSION_MISMATCH are mentioned in the stuck-render entry but not given their own entries — fine for v1 docs.
"What's NOT in v1" section — useful, explicitly limits scope. The reference to "PR 6.10 on the plan" for compositions discovery is slightly internal; readers won't know what that means. Consider replacing with "in a future release" or linking to a tracking issue.
No broken links spotted. [CLI reference](/packages/cli#hyperframes-lambda) assumes that anchor exists on the CLI page — make sure the CLI PR in the stack adds it.
Approved.
vanceingalls
left a comment
There was a problem hiding this comment.
One-line summary: docs page is well-structured and mostly accurate, but the BROWSER_GPU_NOT_SOFTWARE troubleshooting entry points users at a non-existent data-gpu-mode composition attribute — that's a blocker on a docs PR.
Additive review — @miguel-heygen already covered the S3 request-cost line, the internal "PR 6.10" reference, and the [CLI reference] anchor concern. I won't repeat those. The findings below are gaps I didn't see in Miguel's review.
Strengths
docs/deploy/aws-lambda.mdx:138-152— the IAM bootstrap section is genuinely strong: it walks throughpolicies user|role|validate, notes theResource: "*"narrowing path, and explicitly recommendspolicies validateas a CI pre-deploy step. Matches the source intent inpackages/cli/src/commands/lambda/policies.ts:1-22.docs/deploy/aws-lambda.mdx:155-167— cost example output ($0.0214 (Lambda $0.0210 + SFN $0.0004)) matches the actualprogressoutput formatter inpackages/cli/src/commands/lambda/progress.ts:46-48. Concrete and verifiable.- The "What's NOT in v1 surface" section at the bottom is the right shape — adopters waste hours looking for missing webhooks/HDR without a callout like this.
Findings
blocker — docs/deploy/aws-lambda.mdx:173 (Troubleshooting: BROWSER_GPU_NOT_SOFTWARE). The doc tells users:
The compiled composition reads
data-gpu-mode="hardware"(or similar). [...] Change the composition'sdata-gpu-modeor omit it (the default is software).
I grepped the entire repo at the PR head: there is no data-gpu-mode attribute handling anywhere in packages/engine, packages/producer, or packages/aws-lambda. The only hits are this doc line and an unrelated gpuModes array in packages/cli/src/commands/render.ts:422 (local dev-render output, not distributed). The actual error source is packages/engine/src/utils/assertSwiftShader.ts:107-122: it reads chrome://gpu after launch and throws if the GL backend isn't SwiftShader. Its own thrown message says:
"Ensure Chrome was launched with
--use-gl=swiftshader --use-angle=swiftshaderand that the SwiftShader libraries are present in the runtime image."
i.e. the failure is a Lambda runtime-image / launch-flags problem, NOT a composition attribute. An adopter who hits this error and follows the doc's advice will edit a non-existent attribute on their composition and the error will persist. Worse than no advice. Replace this entry with the actual root cause (Chrome launch flags / SwiftShader libs in the handler ZIP) and the actual remediation (rebuild the ZIP with bun run --cwd packages/aws-lambda build:zip and re-deploy, since lambda deploy rebuilds the ZIP that bundles @sparticuz/chromium).
important — coverage gap: hyperframes lambda sites create is not mentioned anywhere in the doc. The CLI's own HELP at packages/cli/src/commands/lambda.ts:18-21 and its examples array call it out as a first-class workflow ("Pre-upload a project so multiple renders share the upload"), and the render subcommand explicitly supports a --site-id flag that consumes its output (packages/cli/src/commands/lambda/render.ts:51-60). For a page titled "Three deployment paths" that's supposed to take adopters from credentials to rendered MP4, omitting the sites workflow leaves users on Path 1 re-tarring + re-uploading the same project on every render — exactly the cost shape the page elsewhere tries to avoid. Add a sites create subsection (Path 1.5 or a "Re-using uploads" callout under Path 1).
important — SAM-path concurrency default mismatch. The doc's framing under Path 1 (docs/deploy/aws-lambda.mdx:62-67) explains why --concurrency=8 is a conservative default that bounds runaway spend, and the Path 2 SAM example happens to pass ReservedConcurrency=8. But the SAM template's own default is -1 (unreserved) — see examples/aws-lambda/template.yaml:36-42. A reader who simplifies the Path 2 example by dropping --parameter-overrides is silently switched from "conservative 8-cap" to "account-default unreserved." Worth one extra line in the Path 2 section: "Drop ReservedConcurrency from --parameter-overrides at your own risk — the template's own default is -1 (unreserved)." Same warning shape as the Path 1 paragraph.
nit — docs/deploy/aws-lambda.mdx:30 ("HyperFrames repo checkout"). Says lambda deploy builds the ZIP from source, and adopters who deploy outside a checkout can set HYPERFRAMES_REPO_ROOT. Verified accurate (packages/cli/src/commands/lambda/repoRoot.ts:15-30). But the env var is undocumented anywhere outside this single table row — worth a one-liner in the env-var reference (if one gets added later), or at least a fuller example here showing the directory structure it expects ($HYPERFRAMES_REPO_ROOT/packages/aws-lambda/package.json must exist).
nit — docs/deploy/aws-lambda.mdx:177 (stuck-at-RUNNING entry) lists FONT_FETCH_FAILED and FFMPEG_VERSION_MISMATCH as examples of typed errors the SFN console surfaces. Verified those names exist in packages/aws-lambda/src/cdk/HyperframesRenderStack.ts:193-207 (alarm dimensions). Miguel suggested giving each its own troubleshooting entry; I'll second that as a low-priority follow-up since these are the most common production failure modes after PLAN_HASH_MISMATCH.
nit — docs/deploy/aws-lambda.mdx:55-58 deploy example doesn't pass --profile, but the CLI documents it (packages/cli/src/commands/lambda.ts:74). For users on multi-account setups, a one-liner mentioning the flag (or the AWS_PROFILE env var fallback that deploy.ts:42 reads) would head off a class of "wrong-account deploy" pitfalls.
Verdict
Verdict: REQUEST CHANGES
Reasoning: the BROWSER_GPU_NOT_SOFTWARE entry actively misleads — it tells adopters to edit a composition attribute that doesn't exist, instead of the real runtime-image fix. That's a blocker on a docs page where the troubleshooting section is the load-bearing reason users land there. Everything else is fixable or punt-able. Fix the GPU entry, optionally add a sites create subsection, and this is good to ship.
Review by Vai
c0895ef to
6faac80
Compare
9c7e205 to
8d6ffe5
Compare
6faac80 to
15289f3
Compare
8d6ffe5 to
0ded0c0
Compare
miguel-heygen
left a comment
There was a problem hiding this comment.
Blocker from the previous review is addressed:
Troubleshooting entry referenced non-existent data-gpu-mode attribute — Fixed. The bogus data-gpu-mode attribute reference is gone. The troubleshooting section now correctly documents the BROWSER_GPU_NOT_SOFTWARE error with the real fix: rebuild the handler ZIP and redeploy. The explanation correctly identifies that the issue is at the runtime-image / launch-flags layer (SwiftShader via --use-gl=swiftshader --use-angle=swiftshader), not at the composition layer, and that lambda deploy always rebuilds the ZIP so a redeploy resolves it.
vanceingalls
left a comment
There was a problem hiding this comment.
Re-review of 0ded0c07 against my prior REQUEST CHANGES at 4304554554.
Resolution status
- Blocker —
BROWSER_GPU_NOT_SOFTWAREpointed at non-existentdata-gpu-mode: resolved. Grepped HEAD (docs/,packages/,examples/) — zero hits fordata-gpu-mode. New entry atdocs/deploy/aws-lambda.mdx:189-198now correctly attributes the failure to the runtime-image / launch-flags layer and tells adopters to rebuild viabun run --cwd packages/aws-lambda build:zip(verified script exists inpackages/aws-lambda/package.json) and redeploy. The Chrome flag pair cited (--use-gl=swiftshader --use-angle=swiftshader) matches whatassertSwiftShader.ts:121says is required. Advice now leads to the actual fix. - Important — missing
sites createworkflow: resolved. New "Pre-staging a project withsites create" subsection atdocs/deploy/aws-lambda.mdx:76-88documents the workflow, the--site-idconsumer, and the content-addressing semantics. The SHA-256 +HeadObjectshort-circuit claim is grounded inpackages/aws-lambda/src/sdk/deploySite.ts:114-126. - Important — SAM
ReservedConcurrencydefault-1mismatch: resolved. Warning callout atdocs/deploy/aws-lambda.mdx:113-115correctly states the SAM template's own default is-1(unreserved) and warns about silently dropping the override. Matchesexamples/aws-lambda/template.yaml:40-42. - Nits (
HYPERFRAMES_REPO_ROOTdepth,--profile/AWS_PROFILE): not addressed. These were optional and remain optional — author's call.
Scope check
Diff between 149555f...0ded0c0 touches one file (docs/deploy/aws-lambda.mdx). No scope creep.
CI
mergeStateStatus=UNSTABLE is failing optional checks only — check_runs shows no failure conclusions on the head SHA. Per Rule 5, this is mention-not-block.
Verdict
Verdict: APPROVE
Reasoning: the blocker is fixed at the root (advice now points to the real runtime-image / Chrome-flags fix instead of a phantom composition attribute), both important items are addressed with technically accurate framing, and nothing else regressed. Nits are author's call.
Review by Vai
The base branch was changed.
0ded0c0 to
a6f848e
Compare
miguel-heygen
left a comment
There was a problem hiding this comment.
Re-approve after rebase. Diff verified unchanged — no data-gpu-mode references.
vanceingalls
left a comment
There was a problem hiding this comment.
Re-approve after rebase onto main. Force-push dismissed my prior --approve (require_last_push_approval: true) — content unchanged, same commits replayed on the new base. All findings from the prior review's resolution still apply.
Re-review by Vai (post-rebase re-stamp)
a6f848e to
408dd28
Compare
miguel-heygen
left a comment
There was a problem hiding this comment.
Re-approve on 408dd28. Same content.
vanceingalls
left a comment
There was a problem hiding this comment.
Re-approve after rebase + #910 smoke fix.
#910 adds the CLI smoke fix on top: @hyperframes/aws-lambda moved to devDependencies, dispatcher dynamic-imports @hyperframes/aws-lambda/sdk (lambda.ts:150) with a friendly ERR_MODULE_NOT_FOUND → npm install handler at :152-158. npm pack / npm install now works because there's no workspace:* protocol in published dependencies. Clean fix.
#912/#913/#914/#915 are pure rebases on top — same commits replayed on the new base, content unchanged vs. the last approved round. Findings from the prior review's resolution still apply.
Re-review by Vai (post-smoke-fix re-stamp)
IAM bootstrap subcommand for the lambda CLI. Closes the "first run hits
'User is not authorized to perform iam:CreateRole'" gap that adopters
otherwise have to figure out by hand.
hyperframes lambda policies user
→ prints an inline-policy doc to attach to the IAM user that runs
the CLI
hyperframes lambda policies role --principal=cloudformation
→ prints { TrustRelationship, InlinePolicy } for a service role
cloudformation can assume
hyperframes lambda policies validate ./infra/policy.json
→ diffs a checked-in policy against the CLI's required action set,
expanding s3:* / s3:Get* / * wildcards, exits non-zero on missing
actions (wire it into CI to catch drift before deploys fail)
The required-actions list is derived from what the SAM template at
examples/aws-lambda/template.yaml needs to create plus what
renderToLambda/getRenderProgress call against S3 + Step Functions at
runtime. Sorted alphabetically per-service so diffs stay readable.
Resource is "*" by design — CloudFormation creates new function /
state-machine / bucket ARNs on every adopter's first deploy. The
generated policy is documented as a starting point; adopters with
stricter postures narrow Resource to the deployed ARNs after the
first successful run.
Tests: 10 unit tests covering the action set, doc shape, trust policy
service principal, and validate() against valid / missing / wildcard /
single-Statement / Deny-statement inputs.
Adds a typed TrustPolicyDocument / TrustPolicyStatement pair so
buildRoleTrustPolicy can return a real type instead of unknown. The
trust-policy shape has a Principal field that the generic
PolicyStatement doesn't model, but it was previously punted via a
return unknown rather than a parallel type.
Test cleanup: drop the `as {...}` casts that the previous return-
unknown signature forced.
One blocker + four importants from Vai's review:
- REQUIRED_ACTIONS was missing `s3:ListAllMyBuckets` (called by
`sam deploy --resolve-s3` on first run to discover/create the
`aws-sam-cli-managed-default-*` artifact bucket) and
`cloudformation:ValidateTemplate` (CFN template validation
during change-set creation). Without these, a first-deploy
adopter with the generated policy hits AccessDenied on the
very call the PR was meant to unblock. Added both.
- `policies role --principal=lambda` was a footgun — it produced
a `lambda.amazonaws.com` trust paired with the full deploy
superset, i.e. a confusingly-overscoped Lambda execution role
no human should attach (the SAM template creates its own
scoped execution role automatically). Dropped `lambda` as a
principal option; `policies role` now always emits a
CloudFormation service-role doc.
- `validatePolicy` silently misreported NotAction/NotResource
statements (treating them as zero grants), producing false
negatives. Detect both shapes and surface them via a new
`warnings: string[]` field; NotAction statements are skipped
(rather than producing a false negative), NotResource is
treated as full action grant + a warning.
- Mid-string wildcards (`s3:Get*Object`, `?`) silently failed
the matcher. End-anchored wildcards still work; mid-string
patterns now warn so users know the validator can't expand
them.
- Dropped the dead `samArtifactBucket` action group (fully
subsumed by `s3Bucket` + `s3Object`).
- `validate --json` now wraps errors in a friendly envelope
(`{ ok: false, error: "..." }`) so CI consumers have one
parse shape regardless of failure mode.
- lambda.ts subcommand description and examples updated to
include `policies`.
Tests: 5 new negative-path tests cover NotAction warning,
NotResource warning, mid-string wildcard warning, missing file
(ENOENT), malformed JSON (SyntaxError), and absent Statement
field. All 21 policies tests pass.
Third harness mode that drives the OSS @hyperframes/aws-lambda handler
through the exact event sequence Step Functions produces in
production:
handler({Action: "plan"}) → planDir tarball on fake S3
handler({Action: "renderChunk"}) × N → chunk artifacts on fake S3
handler({Action: "assemble"}) → final mp4/mov/png-sequence
The S3 client is a filesystem-backed fake (every s3://<bucket>/<key>
URI maps to <tempRoot>/s3/<key>), so the harness exercises the
handler's event-parsing + tar/S3 conventions + dispatch logic on top
of the underlying producer primitives. Regressions in event JSON
shape, S3 key layout, or plan-hash boundary checks now surface in
the same CI run as the in-process and distributed-simulated modes
without paying for a real AWS round-trip.
Deliberately NOT a Docker/RIE invocation — that would gate the
producer test suite on Docker-in-Docker support which most CI
runners lack. Real-ZIP-via-RIE tests live in
packages/aws-lambda/scripts/ (probe:beginframe) and the
maintainer-run smoke.sh.
Wired up via:
- HarnessMode union extended to include "lambda-local"
- parseHarnessModeFlag accepts --mode=lambda-local
- regression-harness.ts dispatches to runLambdaLocalRender for
the new mode, sharing the distributed-support gate +
pathology-floor threshold with distributed-simulated mode
- package.json scripts: test:lambda-local + docker:test:lambda-local
- producer.devDependencies += @hyperframes/aws-lambda (workspace)
- producer/tsconfig.json gains path mappings to self so the type
cycle through aws-lambda's source resolves at typecheck time
without needing producer to be pre-built
Tests: 3 new unit tests on parseHarnessModeFlag + resolveMinPsnrForMode
cover the new mode. End-to-end PSNR contract still runs through
Dockerfile.test (manual + CI).
Three small cleanups on top of the lambda-local harness:
- Drop the unused createReadStream import + its `void` workaround
comment. The aws-lambda handler's tar / S3 transport pulls
createReadStream from its own imports; this file never references
it directly.
- Hoist the dynamic `await import("node:fs")` calls for
writeFileSync out of FilesystemBackedFakeS3.send into the static
import block. Repeated PutObject calls don't need to repay the
dynamic-import cost.
- Hoist the dynamic `await import("@hyperframes/aws-lambda")` call
for untarDirectory similarly. Drops the now-redundant duplicate
aws-lambda import statement.
The PutObject body branch also collapses: `body instanceof Buffer`
and `typeof body === "string"` both call writeFileSync identically,
so they share one branch.
No behavior changes.
The static import of regression-harness-lambda-local.ts pulled @hyperframes/aws-lambda (and its @aws-sdk/* + @sparticuz/chromium transitive deps) at module-load time. Dockerfile.test only copies the producer's own files into the container, so aws-lambda's src isn't present at runtime — and even `--mode=in-process` failed: Error [ERR_MODULE_NOT_FOUND]: Cannot find module '/app/packages/producer/node_modules/@hyperframes/aws-lambda/src/index.ts' imported from /app/packages/producer/src/regression-harness-lambda-local.ts Load the module on demand instead. `--mode=lambda-local` callers pay the import cost; the existing in-process and distributed- simulated modes don't.
Three review items from Vai:
- `Config.width`/`Config.height` are now plumbed through
RunLambdaLocalInput rather than hardcoded inside
runLambdaLocalRender. Lambda-local's whole point is to catch
event-shape drift; if the handler ever starts honouring
Config.width/height (e.g. for canvas sizing), having those
values flow from the caller means the harness sees what the
fixture authored. The interface change makes the eventual
upgrade-to-real-fixture-resolution a one-line dispatch swap.
- Drop the dead `export type { Fps }` and its unused import
from @hyperframes/core. The module never re-exports it.
- The dispatch site in regression-harness.ts now passes 1920×1080
explicitly with a comment marking it as a placeholder until
the harness compiles the composition HTML up-front to surface
the authored data-width/data-height. distributed-simulated
mode uses the same placeholder internally, kept for parity.
No behavior change in the existing modes; lambda-local now has a
clear extension point for honouring fixture dimensions.
End-to-end deploy guide for the AWS Lambda surface. Covers:
- Architecture diagram (Step Functions Plan → Map(N) → Assemble +
the single Lambda function dispatching by Action; pulled from
the distributed rendering plan §15.2).
- Prerequisites table (AWS creds, SAM CLI, bun, repo checkout).
- Three deployment paths: hyperframes lambda CLI (recommended),
direct sam deploy against examples/aws-lambda/template.yaml,
and HyperframesRenderStack CDK construct.
- IAM bootstrap via hyperframes lambda policies user/role/validate.
- Cost shape — how Lambda GB-seconds + SFN transitions roll up
into the displayCost the progress verb prints.
- Troubleshooting block with the typed error names operators
actually hit (PLAN_HASH_MISMATCH, BROWSER_GPU_NOT_SOFTWARE,
iam:CreateRole denial, stuck RUNNING, S3 Retain semantics).
- "What's NOT in v1" callout so adopters don't burn time looking
for webhooks / compositions verb / HDR support.
Registered under a new "Deploy" group in docs.json's Documentation
tab, sitting after Packages so the conceptual flow is "what you
can build" → "how to ship it."
No code changes.
One blocker + two important items from Vai's review:
- The BROWSER_GPU_NOT_SOFTWARE troubleshooting entry pointed
adopters at a non-existent `data-gpu-mode` composition attribute.
Replaced with the actual root cause (Chrome launch flags +
@sparticuz/chromium libs in the handler ZIP) and the actual
remediation: rebuild + redeploy via `lambda deploy` (which
always rebuilds the ZIP). The composition-attribute story
would have sent users editing the wrong file entirely.
- Added a `sites create` subsection under Path 1 so adopters
running tight inner loops know how to reuse a project upload
across many renders instead of re-tarring + re-uploading on
each call. The CLI surface was first-class but the doc had
been silent.
- Added a Warning callout under Path 2 explaining that the SAM
template's own ReservedConcurrency default is `-1` (unreserved)
— a reader simplifying the Path 2 example by dropping the
--parameter-overrides flag would silently switch to unreserved
concurrency and pay the runaway-Map cost. The warning mirrors
the cost-shape callout earlier in the page.
408dd28 to
24ae310
Compare
miguel-heygen
left a comment
There was a problem hiding this comment.
Re-approve on 24ae310. Content-identical.

What
Adds
docs/deploy/aws-lambda.mdx— the end-to-end deployment guide for the new AWS Lambda surface. Registered under a new "Deploy" group in the Mintlify nav (docs/docs.json).Why
Per
DISTRIBUTED-RENDERING-PLAN.md§ 11 Phase 6b PR 6.7: adopters landing on the docs site need a single page that takes them from "I have AWS credentials" to "I have a rendered video in S3" without having to read the SAM template or the SDK source. The page collects everything the implementation PRs in this stack added.How
Covers:
DISTRIBUTED-RENDERING-PLAN.md§ 15.2).hyperframes lambdaCLI (recommended), directsam deployagainstexamples/aws-lambda/template.yaml, and theHyperframesRenderStackCDK construct.hyperframes lambda policies user|role|validate.displayCostthe progress verb prints. Notes that it's best-effort and S3 transfer is excluded.PLAN_HASH_MISMATCH,BROWSER_GPU_NOT_SOFTWARE, theiam:CreateRoledenial, stuckRUNNING, theRetainbucket semantics).No code changes.
Stacks on #909, #910, #912, and #913.
🤖 Generated with Claude Code