Skip to content

bazel: switch C++ toolchain to hermetic-llvm; add etc/bazel-hermetic#10812

Open
oharboe wants to merge 18 commits into
The-OpenROAD-Project:masterfrom
oharboe:hermetic-llvm-toolchain
Open

bazel: switch C++ toolchain to hermetic-llvm; add etc/bazel-hermetic#10812
oharboe wants to merge 18 commits into
The-OpenROAD-Project:masterfrom
oharboe:hermetic-llvm-toolchain

Conversation

@oharboe

@oharboe oharboe commented Jul 4, 2026

Copy link
Copy Markdown
Collaborator

Replaces #10809 (and the earlier #9809).

Scope: this PR does one thing — swap the C++ toolchain — plus the minimum fallout fixes for CI to pass. It unblocks a list of cleanups (below) that are deliberately deferred; review this on the toolchain swap alone.

Tip of the hat to @dzbarsky for pointing us at hermetic-llvm: bazel-contrib/toolchains_llvm#795 (comment)

Problem

Prebuilt LLVM release binaries dynamically link ld.lld against libxml2.so.2. libxml2 2.14 changed its soname to libxml2.so.16, so on Ubuntu 25.10+/Arch/Fedora 41+ the linker cannot start without host workarounds (libxml2-dev + sudo ln), and with a compatibility symlink every link action prints no version information available (llvm/llvm-project#138225; fixed in LLVM release binaries from 23.1.0, scheduled 2026-08-25).

Change

  • Switch the C++ toolchain to hermetic-llvm (BCR module llvm, v0.8.11): statically linked LLVM 22.1.8 binaries and a zero-sysroot cc_toolchain. No host compiler, linker, headers or libraries are involved — the libxml2 problem ceases to exist rather than being worked around, and the toolchain also runs unmodified on non-FHS distros (NixOS). clang-tidy now comes from @llvm//tools:clang-tidy.
  • tcl_lang one-line patch: tclZipfs.c's #include "crypt.h" resolved to glibc's crypt.h instead of the vendored minizip header once glibc headers became explicit -isystem directories. Dropped when a fixed tcl_lang lands in BCR.
  • etc/bazel-hermetic: launches bazel with an allowlisted PATH (no host compilers/linkers/interpreters), BAZEL_DO_NOT_DETECT_CPP_TOOLCHAIN=1, and only the workspace bazelrc — builds fail instead of silently reaching for host tools the build should provide. Proxy/cache env passes through; the bazelisk-managed bazel binary is resolved portably (incorporates the review feedback from bazel: stub ld.lld's libxml2.so.2 via toolchains_llvm; add etc/bazel-hermetic #10809). First finding, fixed here: every py_binary needed a host python3 via rules_python's legacy stub shebang; bootstrap_impl=script removes that.

Verification

Ubuntu 26.04, no libxml2.so.2 on the host (the old sudo ln workaround deleted), gcc/cc/clang/ld/python3/perl/make/cmake all absent from the pruned PATH:

$ etc/bazel-hermetic build //:openroad
INFO: Build completed successfully, 6404 total actions
$ grep -c "libxml2\|/usr/include" build.log
0
$ ldd bazel-bin/openroad | head -5
        linux-vdso.so.1
        libpthread.so.0 => /usr/lib/x86_64-linux-gnu/libpthread.so.0
        libdl.so.2 => /usr/lib/x86_64-linux-gnu/libdl.so.2
        libm.so.6 => /usr/lib/x86_64-linux-gnu/libm.so.6
        libc.so.6 => /usr/lib/x86_64-linux-gnu/libc.so.6

The binary targets glibc 2.28+, so it remains portable across distros.

Type of Change

  • Bug fix (build infrastructure)

  • I have verified that the local build succeeds

  • My code follows the repository's formatting guidelines

  • I have signed my commits (DCO).

Out of scope — follow-up cleanup once this merges

Unblocked by this PR, deliberately not in it. Each is its own concern and its own PR:

  • Revert the bazel: installer for ubuntu26 #10778 workaround in etc/DependencyInstaller.sh::_install_bazel: the Ubuntu 26.04 libxml2-dev install and the libxml2.so.16 → libxml2.so.2 symlink written into /usr/lib/<arch>-linux-gnu/; drop libxml2 from the apt and yum -bazel package lists. Audit whether libc6-dev/glibc-devel and libtinfo6 are still needed under a zero-sysroot toolchain.
  • Comment on closed Bazel build fails on Ubuntu 26.04: downloaded LLVM ld.lld depends on missing libxml2.so.2 #10761: the sudo ln workaround and installer symlink are obsolete and can be removed from affected systems.
  • Verify Bazel build fails on glibc 2.43 #9786 (BCR sed/ncurses vs glibc 2.43 host headers) no longer reproduces — zero-sysroot removes the failure class — and close it; abandoned PR bazel: fix latent C++ compiler errors surfaced on Ubuntu 26.04 LTS #10523 becomes moot.
  • Switch OpenMP to @llvm-project//openmp:libomp (from source, version-locked to the compiler, per review suggestion). Verified working end to end — builds, links statically, set_thread_count engages — including the one-line dep change it needs in the OpenSTA submodule (src/sta/BUILD), which is why it is a follow-up PR rather than part of this one.
  • Drop the tcl_lang crypt.h patch when a fixed release lands in BCR.
  • Update docs that still describe toolchains_llvm workflows, and remove the NixOS "comment out the toolchain" advice (NixOS now works unmodified, per review feedback on this PR).
  • Add a CI job that builds through tools/bazel on a runner without libxml2.so.2/compilers so host-tool leaks cannot regress.
  • Make the test harness hermetic: test/regression_test.sh invokes host python3 for standalone_python tests and //:dup_id_test shells out to bare python3; once fixed, re-tighten tools/bazel with per-action PATH pruning (backed out for now — it also fragmented action cache keys via the per-run toolbox path).
  • Note new QoR/perf baselines for Bazel- and CMake-built openroad from the same source produce different QoR #10336 (compiler moves 20.1.8 → 22.1.8) before any bisection work.

Upstream PRs — after this merges

This PR carries two small local patches instead of waiting on upstream review
cycles. That ordering is deliberate. Master today does not build on hosts
without libxml2.so.2 (Ubuntu 25.10+, Arch, Fedora 41+) except by hand-editing
/usr/lib; that is worth days of review, not weeks of upstream latency. Once
merged and proven stable in daily use, the patches go upstream in the exact
shape usage validated — and if upstream review reshapes them, the follow-up
here adopts what actually lands and drops the local copy.

  • tcl_lang (BCR): the tclZipfs.c crypt.h include fix carried here as bazel/tcl-patches/.
  • hermetic-llvm: propose the libc include-precedence change carried here as bazel/hermetic-llvm-patches/ (-isystem → -idirafter for kernel/glibc headers; fixes the gnulib/BCR-sed and vendored-header-shadowing class); drop the local patch when released.
  • bazel-orfs: gnumake's repository rule runs GNU Make's configure, which probes PATH for ld; plumb the linker like CC is plumbed so the rule survives pruned-PATH environments (etc/bazel).
  • boost.icl (BCR): the strict-weak-ordering fix carried here as bazel/boost-icl-patches/ is already merged upstream (Refactored interval lookup operations boostorg/icl#54); drop the patch when a boost.icl release containing it reaches BCR. Worth a comment on interval_map::operator-= leaves phantom covering intervals boostorg/icl#55 that Macro halo option, unit tests for tapcell #54 resolves the interval_set case under libc++.
  • hermetic-llvm: field report + fixes from this adoption — dangling .tbd symlinks when an umbrella sub-framework's private-framework target is not in the SDK subset (FontServices case), and the undocumented PrintCoreusr/include/cups coupling.

oharboe added 2 commits July 4, 2026 15:00
Replace toolchains_llvm + prebuilt LLVM release binaries with
hermetic-llvm (BCR module 'llvm'): statically linked LLVM 22.1.8
binaries and a zero-sysroot cc_toolchain. No host compiler, linker,
headers or libraries are involved; the toolchain runs unmodified on
hosts without libxml2.so.2 (Ubuntu 25.10+, Arch, Fedora 41+, where
prebuilt ld.lld could not start) and on non-FHS distros such as NixOS.

clang-tidy now comes from @llvm//tools:clang-tidy.

tcl_lang needs a one-line patch: tclZipfs.c's '#include "crypt.h"'
resolved to glibc's crypt.h instead of the vendored minizip header once
glibc headers became explicit -isystem directories. Drop the override
when a fixed tcl_lang lands in BCR.

Tip of the hat to @dzbarsky for pointing us at hermetic-llvm:
bazel-contrib/toolchains_llvm#795 (comment)

Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
Launches bazel with an allowlisted PATH (no host compilers, linkers or
interpreters), BAZEL_DO_NOT_DETECT_CPP_TOOLCHAIN=1, and only the
workspace bazelrc, so builds fail instead of silently reaching for host
tools the build should provide. Proxy and cache environment variables
pass through; the bazelisk-managed bazel binary is resolved portably.

First finding: rules_python's legacy py_binary stub needs a host
python3 ('#!/usr/bin/env python3') before the hermetic interpreter
takes over. bootstrap_impl=script uses a shell stub instead.

Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
@oharboe oharboe requested a review from a team as a code owner July 4, 2026 13:01
@oharboe oharboe requested a review from osamahammad21 July 4, 2026 13:01
@github-actions github-actions Bot added the size/M label Jul 4, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request transitions the C++ toolchain from toolchains_llvm to hermetic-llvm (@llvm), updates associated configurations, documentation, and scripts, and introduces a new etc/bazel-hermetic script to run Bazel in a pruned environment. Feedback on the new hermeticity script highlights several critical issues: command -v can create circular symlinks for shell builtins, the ELF binary check breaks on macOS, essential environment variables for SSH and SSL are not preserved, and the use of exec bypasses the cleanup trap, leading to leaked temporary directories.

Comment thread etc/bazel
Comment thread etc/bazel-hermetic Outdated
Comment thread etc/bazel-hermetic Outdated
Comment thread etc/bazel-hermetic Outdated
- type -P instead of command -v: builtins (pwd, test, true, printf)
  returned bare names and produced self-referential toolbox symlinks.
- Detect wrappers by bazelisk name or shebang instead of requiring ELF,
  so native macOS (Mach-O) bazel binaries are symlinked directly.
- Preserve SSH_AUTH_SOCK, SSL_CERT_FILE/DIR and TMPDIR alongside proxy
  and cache variables.
- Drop exec so the EXIT trap runs and the toolbox directory is not
  leaked on every invocation.

Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
@oharboe

oharboe commented Jul 4, 2026

Copy link
Copy Markdown
Collaborator Author

@hzeller Look what the cat dragged in! Hermetic LLVM or halucination?

oharboe added 2 commits July 4, 2026 15:27
std::filesystem (src/tcl_readline_setup.cc) requires macOS 10.15. The
previous 10.13 floor was never enforced: toolchains_llvm ignored
--macos_minimum_os and compiled against the host SDK default.
hermetic-llvm honors it, failing the Mac build with
"'path' is unavailable: introduced in macOS 10.15".

Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
With bootstrap_impl=script, a py_binary (tclint) invoked via realpath
cannot locate its runfiles after the lint scripts cd to the workspace.
Export the runfiles root so nested tools resolve it from the
environment. No-op under 'bazel test', where the runner sets it.

Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
@oharboe

oharboe commented Jul 4, 2026

Copy link
Copy Markdown
Collaborator Author

@maliberty Look at what the cat(Claude) dragged in :-)

@dzbarsky dzbarsky left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool!

Comment thread tools/bazel Outdated
Comment thread etc/bazel-hermetic Outdated
Comment thread etc/bazel
Comment thread MODULE.bazel
oharboe added 2 commits July 4, 2026 15:47
hermetic-llvm ships a deliberately minimal macOS SDK subset; frameworks
beyond its six defaults are opt-in via the osx.frameworks extension
tag. Qt's bootstrap failed on the first missing one
(ApplicationServices/ApplicationServices.h). List the desktop
frameworks qtbase links against.

Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
ATS (inside the ApplicationServices umbrella) ships .tbd symlinks into
the FontServices private framework. Without it in the subset they
dangle, and bazel rejects the whole sysroot as an action input
('The file type ... is not supported'). Zero dangling symlinks remain
in the extracted SDK after this.

Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
@hzeller

hzeller commented Jul 4, 2026

Copy link
Copy Markdown
Collaborator

This looks very promising!
As I said before, the only way to get an actual hermetic toolchain is to boostrap it.

Still waiting for the build to complete but it looks like this is the first time in 2 years that I don't have to comment out the llvm toolchain on nixos, as it properly works!

oharboe added 5 commits July 4, 2026 16:19
hermetic-llvm's SDK extraction excludes usr/include/cups unless
PrintCore is listed explicitly; naming only its ApplicationServices
umbrella left PDEPluginInterface.h's '#import <cups/ppd.h>' dangling
when Qt compiles Objective-C++.

Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
AppKit's NSColor.h includes CoreImage/CIColor.h unconditionally.

Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
Suggestions from review:
- tools/bazel is bazelisk's wrapper convention, so the pruned
  environment is applied automatically to every bazelisk invocation
  (Go bazelisk; the npm package does not implement the convention).
  $BAZEL_REAL is used when bazelisk provides it; direct invocation and
  BAZELISK_SKIP_WRAPPER=1 remain available.
- BAZEL_DO_NOT_DETECT_CPP_TOOLCHAIN=1 moves to .bazelrc so it applies
  to unwrapped invocations too.
- --host_action_env=PATH complements --action_env for exec-config
  actions.

DISPLAY/WAYLAND_DISPLAY/XAUTHORITY pass through for 'bazel run' of GUI
targets.

Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
AppKit reaches all three on macOS: NSImageView.h includes
Symbols/NSSymbolEffect.h unconditionally, the NSText headers take the
non-UIKit __has_include branch into the UIFoundation private framework,
and NSSharingService pulls CloudKit.

Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
Jenkins surfaced three problems with the first wrapper cut:

- --action_env=PATH broke test actions: the harness uses tee/file and
  standalone_python tests invoke host python3 (bazel's own
  test-setup.sh also uses file). Actions now keep bazel's strict
  default PATH; per-action pruning returns when the harness stops
  using host tools (see the PR cleanup list).
- An --action_env PATH naming a per-run mktemp directory would also
  fragment action cache keys across runs and machines.
- The npm bazelisk implements the tools/bazel wrapper convention too,
  so the wrapper's own 'bazelisk --version' probe re-entered the
  wrapper; probe with BAZELISK_SKIP_WRAPPER=1 and pass flags-only
  invocations (e.g. --version) through verbatim.

Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
@maliberty maliberty requested review from maliberty and removed request for osamahammad21 July 4, 2026 16:03
Comment thread etc/bazel
@@ -0,0 +1,144 @@
#!/usr/bin/env bash

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this script needed ? It makes things very fragile

For me, just invoking bazel exits immediately without any message. It looks like this is because it looks for bazelisk. But a bazel that is immediately exiting without any message is disconcerning.

The script looks like it is good to test if the build reaches out to binaries outside the 'common' ones, but that is more a developer feature not something we should force on people by default as they will just give up.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Plan: once the server build is clean, this moves to etc/bazel as an opt-in developer tool (no bazelisk auto-invocation); the toolchain-autodetection guard already lives in .bazelrc and stays. The silent exit you hit is the wrapper failing to resolve a non-bazelisk bazel — that failure mode disappears with the move.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in a8f0560 — the wrapper is now opt-in at etc/bazel; nothing is auto-invoked.

boost::icl's exclusive_less_than comparator violates strict weak
ordering (boostorg/icl#51). libstdc++ tolerates it; libc++'s std::map
does not: interval_set::operator-= subtracts only the first overlapping
interval and leaves phantom coverage (boostorg/icl#55). With the
hermetic toolchain's libc++, pad placed IO filler cells on top of pads
(seven src/pad tests) and grt's overlapping_edges golden diverged.

Carry upstream boostorg/icl@7de3f55655 ('Refactored interval lookup
operations', The-OpenROAD-Project#54, merged after boost 1.90), trimmed to include/, as a
single_version_override patch. Verified: an isolated interval_set
subtraction repro is wrong with libc++ headers and correct with the
patch at -O0 through -O3; all 61 //src/pad tests and
//src/grt/test:overlapping_edges pass. Drop the patch when a boost.icl
release containing the fix lands in BCR.

Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
@github-actions github-actions Bot added size/XL and removed size/M labels Jul 4, 2026
hermetic-llvm injects Linux kernel and glibc headers as the first
-isystem entries, outranking -isystem/-I directories of libraries that
deliberately shadow libc headers. BCR sed (gnulib's stdio.h
replacement) fails to compile (SETLOCALE_NULL_MAX, _GL_ATTRIBUTE_*
undeclared) — breaking test/orfs/gcd targets on Jenkins — and tcl's
vendored minizip crypt.h loses to glibc's crypt.h. Host sysroots
provide libc via the default search path, searched after all user
includes; carry a patch switching the two cc_args to -idirafter to
restore that precedence. Verified: unpatched BCR sed 4.9 builds;
//:openroad builds; 327 pad/grt/odb tests pass. Propose upstream after
this stabilizes; drop the patch when released.

Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
Review feedback: bazelisk auto-invoking tools/bazel forced the pruned
environment on everyone — a developer whose bazel is not
bazelisk-managed got a silent exit, and repository rules of
dependencies legitimately probe for host tools the wrapper prunes
(GNU Make's configure looks for ld in bazel-orfs's gnumake rule,
failing the Jenkins orfs targets). etc/bazel is invoked explicitly;
the toolchain-autodetection guard stays in .bazelrc for everyone.

Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
@oharboe

oharboe commented Jul 4, 2026

Copy link
Copy Markdown
Collaborator Author

Performance status

Verified so far

  • Full regression suite: 1656/1656 tests pass under the new toolchain (bazelisk test //src/..., Ubuntu 26.04), with no test-timeout regressions at the configured timeout tiers.
  • OpenMP, by inspection: the runtime comes from the BCR openmp module and is statically linked — ldd bazel-bin/openroad shows no libomp/libgomp, nm shows the __kmpc_* entry points embedded — and it engages at runtime:
    > set_thread_count 8
    [INFO ORD-0030] Using 8 thread(s).
    
    No host OpenMP is involved. Note the version skew: libomp 21.1.5 with clang 22.1.8 (libomp keeps forward ABI compatibility; a matching 22.x module bump is in the cleanup list).
  • Optimization configuration is unchanged: same -c opt default, same --config=opt (-O3 + LTO) flags; the compiler moves 20.1.8 → 22.1.8.

Not yet verified — should be checked before relying on QoR/runtime numbers

@dzbarsky

dzbarsky commented Jul 4, 2026

Copy link
Copy Markdown

Btw one more thing - there is a libomp target in @llvm in case you want to use that instead of the older BCR module

@hzeller

hzeller commented Jul 4, 2026

Copy link
Copy Markdown
Collaborator

Btw one more thing - there is a libomp target in @llvm in case you want to use that instead of the older BCR module

OTOH, We should always be aware of people not using this particular toolchain, so we should not mingle actual depenencies (we need openmp) with accidental dependencies (because they happen to be in the toolchain).
Rather, we should update openmp upstream in BCR if needed.

@oharboe

oharboe commented Jul 4, 2026

Copy link
Copy Markdown
Collaborator Author

I never understood the LLVM libomp dichotomy. Are they not joined at the hip?

@oharboe

oharboe commented Jul 4, 2026

Copy link
Copy Markdown
Collaborator Author

OpenROAD is critically dependent on libomp for performance, there is no performance compromise possible there. Even a few percent is a dealbreaker.

@dzbarsky

dzbarsky commented Jul 4, 2026

Copy link
Copy Markdown

Btw one more thing - there is a libomp target in @llvm in case you want to use that instead of the older BCR module

OTOH, We should always be aware of people not using this particular toolchain, so we should not mingle actual depenencies (we need openmp) with accidental dependencies (because they happen to be in the toolchain). Rather, we should update openmp upstream in BCR if needed.

It's not an implicit toolchain dependency, it's an explicit @llvm//libomp target that you add to deps just like the BCR target. The difference is that it's built from source instead of prebuilt. Just figured it's worth clarifying!

libc++ marks floating-point std::to_chars (used by std::format in
OpenSTA) as introduced in macOS 13.3. The SDK's libc++ headers enforce
availability annotations; earlier runs compiled against header sets
with annotations disabled, so the 10.15 floor only appeared
sufficient.

Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
@dzbarsky

dzbarsky commented Jul 4, 2026

Copy link
Copy Markdown

Hmm looks like the openmp in BCR is a from-source you guys did, I must have been misremembering :)

@oharboe

oharboe commented Jul 4, 2026

Copy link
Copy Markdown
Collaborator Author

Tried @llvm-project//openmp:libomp per the suggestion — works end to end (builds from source, links statically, set_thread_count 8 engages, no dynamic libomp). It needs a one-line dep change in the OpenSTA submodule (its BUILD comments doubted @OpenMP was needed, but Eigen includes omp.h under -fopenmp), so it goes in a follow-up PR to keep this one single-concern. The verified diff is ready.

Qt's Cocoa QPA plugin (qnsview.cpp) includes MetalKit/MetalKit.h.

Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
@oharboe

oharboe commented Jul 4, 2026

Copy link
Copy Markdown
Collaborator Author

Running times: this PR vs recently merged PRs (Jenkins CI)

Longest-running tests, wall-clock as reported by bazel test on the public Jenkins fleet. Baselines are the two most recently merged PRs (#10810, #10811, both master + a trivial diff); this PR's numbers are from run 10 (the first fully-passing test run; later runs reuse cached results).

test master (#10810) master (#10811) this PR
//test:upf_aes-tcl_test 256.7s 486.8s 440.0s
//src/rmp/test:aes_genetic-tcl_test 147.1s 174.2s 135.0s
//src/rmp/test:aes_annealing-tcl_test 51.0s 102.4s 39.4s
//src/rmp/test:gcd_genetic-tcl_test 45.0s 105.6s 34.2s
//src/cgt/test:ibex_sky130hd-tcl_test 40.1s 72.8s 57.9s
//src/rsz/test:repair_setup_reroute1-tcl_test 20.8s 38.5s 26.1s

Observations:

  • On every test, this PR's time falls within the range spanned by master's own two baselines.
  • Master's baselines differ from each other by up to 2.3× on identical code (gcd_genetic: 45.0s vs 105.6s), so CI wall-clock under --jobs=200 test parallelism measures machine contention more than binary speed. Within that noise, there is no detectable runtime regression — and no detectable improvement either.
  • A controlled comparison (same dedicated machine, repeated runs, old vs new toolchain binaries) is the only way to resolve differences smaller than this noise band; that remains open as noted in the performance comment above.

MetalKit's MTKModel.h includes ModelIO/ModelIO.h; the MetalKit+ModelIO
header closure resolves entirely within the subset now (verified by
scanning their framework imports against the extracted tree).

Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
@oharboe

oharboe commented Jul 5, 2026

Copy link
Copy Markdown
Collaborator Author

Field notes for maintainers

Reference for anyone touching the toolchain later; consolidates what was learned debugging this PR.

Root causes behind the carried patches

Symptom Root cause Fix carried here Upstream status
pad placed IO fillers on pads (7 tests); grt overlapping_edges golden diff boost::icl's exclusive_less_than violates strict weak ordering (boostorg/icl#51); libstdc++ tolerates it, libc++ does not — interval_set::operator-= leaves phantom coverage (boostorg/icl#55). Surfaced because this PR switches the stdlib from host libstdc++ to libc++. bazel/boost-icl-patches/ Fixed upstream (boostorg/icl#54, post-1.90); drop when a release reaches BCR
BCR sed fails to compile (SETLOCALE_NULL_MAX), breaking test/orfs/gcd; tcl needed a crypt.h patch hermetic-llvm injects kernel/glibc headers as the first -isystem, outranking libraries that deliberately shadow libc headers (gnulib's stdio.h, tcl's vendored minizip crypt.h). Host sysroots are searched last. bazel/hermetic-llvm-patches/ (-isystem-idirafter) To propose upstream
ld.lld: libxml2.so.2 not found (the original bug) Prebuilt LLVM release binaries link libxml2 dynamically; fixed in LLVM release binaries from 23.1.0 (llvm/llvm-project#138225) Gone by construction: statically linked toolchain n/a

macOS

hermetic-llvm's SDK subset is opt-in per framework. Qt needs the full desktop set now listed in MODULE.bazel; a missing one fails as fatal error: 'X/Y.h' file not found — add X to osx.frameworks, refetch @@llvm++osx+macos_sdk, and check find sysroot -xtype l for dangling .tbd symlinks (that's how FontServices was found). The deployment floor is 13.3 because libc++ availability-annotates floating-point std::to_chars (std::format, used by OpenSTA); the old 10.13/10.15 floors only ever "worked" because toolchains_llvm didn't enforce them.

etc/bazel

Opt-in launcher that prunes host toolchains from repository rules and fails instead of silently using them. It was briefly tools/bazel (bazelisk auto-invoke); reverted because non-bazelisk setups exited silently, the test harness still uses host python3/tee/file, and a per-run PATH in --action_env fragments action-cache keys. Re-tightening is in the cleanup list, gated on a hermetic test harness.

OpenMP

The runtime comes from the BCR openmp module, statically linked; verified engaging under set_thread_count. Switching to @llvm-project//openmp:libomp (version-locked to the compiler) is verified working end to end but needs a one-line dep change in the OpenSTA submodule — follow-up PR. Note OpenSTA's # Is needed? Nobody includes omp.h comment is wrong: Eigen includes omp.h under -fopenmp.

@maliberty

Copy link
Copy Markdown
Member

Master's baselines differ from each other by up to 2.3× on identical code

Its best not to trust the CI for runtime benchmarking and to run locally. Also be careful if using the CI exclude any time spent waiting for a machine.

@maliberty

Copy link
Copy Markdown
Member

I noticed the if you run the build twice in a row that there is still compilation being done. From Claude:

  Why it re-analyzes every time

  Line 37 creates a fresh random directory on every invocation:

  TOOLBOX=$(mktemp -d "${TMPDIR:-/tmp}/bazel-hermetic.XXXXXX")

  and line 105 feeds that path into Bazel as a repository environment variable:

  extra_flags=( "--repo_env=PATH=${TOOLBOX}" )

  --repo_env is part of Bazel's analysis/configuration key. When its value
  changes between runs, Bazel invalidates the loading+analysis cache and re-runs
  it — which is exactly your 514 packages loaded, 86111 targets configured on
  every run. On a truly warm no-op build you'd instead see (0 packages loaded, 0
  targets configured).

  Because mktemp -d mints a new …/bazel-hermetic.XXXXXX path each time, the
  --repo_env=PATH= value is different every single invocation, so the cache
  never hits.

  There's real irony here: the wrapper's own comment (lines 99–104) explains
  they deliberately avoided --action_env PATH=${TOOLBOX} precisely because a
  per-run directory "would fragment action cache keys across runs." The same
  fragmentation is happening to the repo/analysis cache via --repo_env — just a
  different cache.

  Note on what you're seeing

  The [Prepa] … (200 actions, 0 running) lines with 0 running mean it's not
  actually recompiling — those are cache hits being re-prepared as Bazel walks
  the (re-analyzed) action graph. So you're paying the analysis + graph-setup
  cost each time, not full recompilation. Still slow and wasteful, but your
  object files aren't being rebuilt.

  The fix

  Make TOOLBOX a stable, deterministic path so --repo_env stops changing. Key it
  off the contents that actually define the toolbox (the allowlist + resolved
  bazel binary), and reuse it if it already exists. For example:

  # Deterministic toolbox: stable across runs so --repo_env=PATH doesn't
  # fragment the analysis cache. Keyed on the allowlist + bazel binary so it
  # rebuilds only when the contract changes.
  key=$(printf '%s\n' "${ALLOWED_TOOLS[@]}" "${BAZEL}" | sha256sum | cut -c1-16)
  TOOLBOX="${XDG_CACHE_HOME:-$HOME/.cache}/bazel-hermetic/${key}"

  Two things to note if you do this:
  - BAZEL is resolved further down (lines 53–80), so the toolbox setup that
  builds the symlinks needs to move after that resolution, or key the hash off
  ALLOWED_TOOLS alone.
  - Drop the mktemp + trap 'rm -rf' EXIT cleanup (or make it a no-op) since the
  dir is now persistent and reused. Rebuild the symlinks each run (cheap) or
  skip if the dir already exists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bazel build fails on Ubuntu 26.04: downloaded LLVM ld.lld depends on missing libxml2.so.2

4 participants