Skip to content

ci/deploy: add phase markers to devsite.sh#36794

Draft
bosconi wants to merge 1 commit into
mainfrom
jc/devsite-phase-markers
Draft

ci/deploy: add phase markers to devsite.sh#36794
bosconi wants to merge 1 commit into
mainfrom
jc/devsite-phase-markers

Conversation

@bosconi
Copy link
Copy Markdown
Member

@bosconi bosconi commented May 29, 2026

Summary

Wraps each step of ci/deploy/devsite.sh in ci_collapsed_heading (from misc/shlib/shlib.bash) so the Buildkite UI gets per-phase sections and so log scraping can pin down which sub-step dominates the deploy time.

Motivation

The deploy-devsite Buildkite step has grown from ~25m → ~48m p50 over the past 60 days (data: 783 builds on main). It runs serially on a single concurrency slot (concurrency_group: deploy/devsite), so on burst days a 5-10 commit run can queue 4-8 hours of downstream work.

Today the script logs only three intermediate markers:

  • Two Docs are in /mnt/build/doc/index.html lines from bin/doc
  • The pdoc failed to precompile the search index line from bin/pydoc

That is enough to identify two growth modes:

  • bin/doc (public rustdoc) stepped from 3m to 10m between Apr 15 and Apr 30 and has been stable since
  • The combined aws s3 sync x 3 + bin/pydoc phase has grown linearly 19m to 33m with no markers in between

But it is not enough to attribute that ~14m of linear growth to any one of: aws sync of target-xcompile/doc/ (rustdoc HTML count keeps growing), aws sync of target/pydoc/, or bin/pydoc itself (the failing search-index step suggests Python doc volume is hitting a scaling limit).

Change

Sources misc/shlib/shlib.bash and adds eight ci_collapsed_heading lines, one before each existing command. No command changes, no behavior changes.

After a week of post-merge data we will know which phase is the linear-creep driver and can decide whether to invest in faster pdoc, S3 sync parallelism, or path-based skipping for non-docs commits.

Test plan

  • Buildkite log for the next deploy build on main shows eight collapsed sections with durations
  • No change to the artifacts uploaded to S3
  • bash -n ci/deploy/devsite.sh passes (verified locally)

Wrap each step of the dev-site deploy in `ci_collapsed_heading` so the
Buildkite UI shows per-phase sections and durations, and so log scraping
can pin down which sub-step is slow. The script currently runs ~45m end
to end with only two intermediate log markers (the two "Docs are in"
lines from bin/doc), which makes it impossible to attribute the ~14m of
growth over the past 60 days to bin/pydoc, the S3 syncs, or rustdoc
without instrumenting first.

No behavior change — only adds `. misc/shlib/shlib.bash` and eight
`ci_collapsed_heading` lines around the existing commands.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant