bazel: switch C++ toolchain to hermetic-llvm; add etc/bazel-hermetic#10812
bazel: switch C++ toolchain to hermetic-llvm; add etc/bazel-hermetic#10812oharboe wants to merge 18 commits into
Conversation
Replace toolchains_llvm + prebuilt LLVM release binaries with hermetic-llvm (BCR module 'llvm'): statically linked LLVM 22.1.8 binaries and a zero-sysroot cc_toolchain. No host compiler, linker, headers or libraries are involved; the toolchain runs unmodified on hosts without libxml2.so.2 (Ubuntu 25.10+, Arch, Fedora 41+, where prebuilt ld.lld could not start) and on non-FHS distros such as NixOS. clang-tidy now comes from @llvm//tools:clang-tidy. tcl_lang needs a one-line patch: tclZipfs.c's '#include "crypt.h"' resolved to glibc's crypt.h instead of the vendored minizip header once glibc headers became explicit -isystem directories. Drop the override when a fixed tcl_lang lands in BCR. Tip of the hat to @dzbarsky for pointing us at hermetic-llvm: bazel-contrib/toolchains_llvm#795 (comment) Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
Launches bazel with an allowlisted PATH (no host compilers, linkers or
interpreters), BAZEL_DO_NOT_DETECT_CPP_TOOLCHAIN=1, and only the
workspace bazelrc, so builds fail instead of silently reaching for host
tools the build should provide. Proxy and cache environment variables
pass through; the bazelisk-managed bazel binary is resolved portably.
First finding: rules_python's legacy py_binary stub needs a host
python3 ('#!/usr/bin/env python3') before the hermetic interpreter
takes over. bootstrap_impl=script uses a shell stub instead.
Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
There was a problem hiding this comment.
Code Review
This pull request transitions the C++ toolchain from toolchains_llvm to hermetic-llvm (@llvm), updates associated configurations, documentation, and scripts, and introduces a new etc/bazel-hermetic script to run Bazel in a pruned environment. Feedback on the new hermeticity script highlights several critical issues: command -v can create circular symlinks for shell builtins, the ELF binary check breaks on macOS, essential environment variables for SSH and SSL are not preserved, and the use of exec bypasses the cleanup trap, leading to leaked temporary directories.
- type -P instead of command -v: builtins (pwd, test, true, printf) returned bare names and produced self-referential toolbox symlinks. - Detect wrappers by bazelisk name or shebang instead of requiring ELF, so native macOS (Mach-O) bazel binaries are symlinked directly. - Preserve SSH_AUTH_SOCK, SSL_CERT_FILE/DIR and TMPDIR alongside proxy and cache variables. - Drop exec so the EXIT trap runs and the toolbox directory is not leaked on every invocation. Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
|
@hzeller Look what the cat dragged in! Hermetic LLVM or halucination? |
std::filesystem (src/tcl_readline_setup.cc) requires macOS 10.15. The previous 10.13 floor was never enforced: toolchains_llvm ignored --macos_minimum_os and compiled against the host SDK default. hermetic-llvm honors it, failing the Mac build with "'path' is unavailable: introduced in macOS 10.15". Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
With bootstrap_impl=script, a py_binary (tclint) invoked via realpath cannot locate its runfiles after the lint scripts cd to the workspace. Export the runfiles root so nested tools resolve it from the environment. No-op under 'bazel test', where the runner sets it. Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
|
@maliberty Look at what the cat(Claude) dragged in :-) |
hermetic-llvm ships a deliberately minimal macOS SDK subset; frameworks beyond its six defaults are opt-in via the osx.frameworks extension tag. Qt's bootstrap failed on the first missing one (ApplicationServices/ApplicationServices.h). List the desktop frameworks qtbase links against. Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
ATS (inside the ApplicationServices umbrella) ships .tbd symlinks into
the FontServices private framework. Without it in the subset they
dangle, and bazel rejects the whole sysroot as an action input
('The file type ... is not supported'). Zero dangling symlinks remain
in the extracted SDK after this.
Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
|
This looks very promising! Still waiting for the build to complete but it looks like this is the first time in 2 years that I don't have to comment out the llvm toolchain on nixos, as it properly works! |
hermetic-llvm's SDK extraction excludes usr/include/cups unless PrintCore is listed explicitly; naming only its ApplicationServices umbrella left PDEPluginInterface.h's '#import <cups/ppd.h>' dangling when Qt compiles Objective-C++. Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
AppKit's NSColor.h includes CoreImage/CIColor.h unconditionally. Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
Suggestions from review: - tools/bazel is bazelisk's wrapper convention, so the pruned environment is applied automatically to every bazelisk invocation (Go bazelisk; the npm package does not implement the convention). $BAZEL_REAL is used when bazelisk provides it; direct invocation and BAZELISK_SKIP_WRAPPER=1 remain available. - BAZEL_DO_NOT_DETECT_CPP_TOOLCHAIN=1 moves to .bazelrc so it applies to unwrapped invocations too. - --host_action_env=PATH complements --action_env for exec-config actions. DISPLAY/WAYLAND_DISPLAY/XAUTHORITY pass through for 'bazel run' of GUI targets. Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
AppKit reaches all three on macOS: NSImageView.h includes Symbols/NSSymbolEffect.h unconditionally, the NSText headers take the non-UIKit __has_include branch into the UIFoundation private framework, and NSSharingService pulls CloudKit. Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
Jenkins surfaced three problems with the first wrapper cut: - --action_env=PATH broke test actions: the harness uses tee/file and standalone_python tests invoke host python3 (bazel's own test-setup.sh also uses file). Actions now keep bazel's strict default PATH; per-action pruning returns when the harness stops using host tools (see the PR cleanup list). - An --action_env PATH naming a per-run mktemp directory would also fragment action cache keys across runs and machines. - The npm bazelisk implements the tools/bazel wrapper convention too, so the wrapper's own 'bazelisk --version' probe re-entered the wrapper; probe with BAZELISK_SKIP_WRAPPER=1 and pass flags-only invocations (e.g. --version) through verbatim. Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
| @@ -0,0 +1,144 @@ | |||
| #!/usr/bin/env bash | |||
There was a problem hiding this comment.
is this script needed ? It makes things very fragile
For me, just invoking bazel exits immediately without any message. It looks like this is because it looks for bazelisk. But a bazel that is immediately exiting without any message is disconcerning.
The script looks like it is good to test if the build reaches out to binaries outside the 'common' ones, but that is more a developer feature not something we should force on people by default as they will just give up.
There was a problem hiding this comment.
Agreed. Plan: once the server build is clean, this moves to etc/bazel as an opt-in developer tool (no bazelisk auto-invocation); the toolchain-autodetection guard already lives in .bazelrc and stays. The silent exit you hit is the wrapper failing to resolve a non-bazelisk bazel — that failure mode disappears with the move.
There was a problem hiding this comment.
Done in a8f0560 — the wrapper is now opt-in at etc/bazel; nothing is auto-invoked.
boost::icl's exclusive_less_than comparator violates strict weak ordering (boostorg/icl#51). libstdc++ tolerates it; libc++'s std::map does not: interval_set::operator-= subtracts only the first overlapping interval and leaves phantom coverage (boostorg/icl#55). With the hermetic toolchain's libc++, pad placed IO filler cells on top of pads (seven src/pad tests) and grt's overlapping_edges golden diverged. Carry upstream boostorg/icl@7de3f55655 ('Refactored interval lookup operations', The-OpenROAD-Project#54, merged after boost 1.90), trimmed to include/, as a single_version_override patch. Verified: an isolated interval_set subtraction repro is wrong with libc++ headers and correct with the patch at -O0 through -O3; all 61 //src/pad tests and //src/grt/test:overlapping_edges pass. Drop the patch when a boost.icl release containing the fix lands in BCR. Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
hermetic-llvm injects Linux kernel and glibc headers as the first -isystem entries, outranking -isystem/-I directories of libraries that deliberately shadow libc headers. BCR sed (gnulib's stdio.h replacement) fails to compile (SETLOCALE_NULL_MAX, _GL_ATTRIBUTE_* undeclared) — breaking test/orfs/gcd targets on Jenkins — and tcl's vendored minizip crypt.h loses to glibc's crypt.h. Host sysroots provide libc via the default search path, searched after all user includes; carry a patch switching the two cc_args to -idirafter to restore that precedence. Verified: unpatched BCR sed 4.9 builds; //:openroad builds; 327 pad/grt/odb tests pass. Propose upstream after this stabilizes; drop the patch when released. Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
Review feedback: bazelisk auto-invoking tools/bazel forced the pruned environment on everyone — a developer whose bazel is not bazelisk-managed got a silent exit, and repository rules of dependencies legitimately probe for host tools the wrapper prunes (GNU Make's configure looks for ld in bazel-orfs's gnumake rule, failing the Jenkins orfs targets). etc/bazel is invoked explicitly; the toolchain-autodetection guard stays in .bazelrc for everyone. Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
Performance statusVerified so far
Not yet verified — should be checked before relying on QoR/runtime numbers
|
|
Btw one more thing - there is a libomp target in @llvm in case you want to use that instead of the older BCR module |
OTOH, We should always be aware of people not using this particular toolchain, so we should not mingle actual depenencies (we need openmp) with accidental dependencies (because they happen to be in the toolchain). |
|
I never understood the LLVM libomp dichotomy. Are they not joined at the hip? |
|
OpenROAD is critically dependent on libomp for performance, there is no performance compromise possible there. Even a few percent is a dealbreaker. |
It's not an implicit toolchain dependency, it's an explicit @llvm//libomp target that you add to deps just like the BCR target. The difference is that it's built from source instead of prebuilt. Just figured it's worth clarifying! |
libc++ marks floating-point std::to_chars (used by std::format in OpenSTA) as introduced in macOS 13.3. The SDK's libc++ headers enforce availability annotations; earlier runs compiled against header sets with annotations disabled, so the 10.15 floor only appeared sufficient. Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
|
Hmm looks like the openmp in BCR is a from-source you guys did, I must have been misremembering :) |
|
Tried @llvm-project//openmp:libomp per the suggestion — works end to end (builds from source, links statically, |
Qt's Cocoa QPA plugin (qnsview.cpp) includes MetalKit/MetalKit.h. Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
Running times: this PR vs recently merged PRs (Jenkins CI)Longest-running tests, wall-clock as reported by
Observations:
|
MetalKit's MTKModel.h includes ModelIO/ModelIO.h; the MetalKit+ModelIO header closure resolves entirely within the subset now (verified by scanning their framework imports against the extracted tree). Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
Field notes for maintainersReference for anyone touching the toolchain later; consolidates what was learned debugging this PR. Root causes behind the carried patches
macOShermetic-llvm's SDK subset is opt-in per framework. Qt needs the full desktop set now listed in MODULE.bazel; a missing one fails as etc/bazelOpt-in launcher that prunes host toolchains from repository rules and fails instead of silently using them. It was briefly OpenMPThe runtime comes from the BCR |
Its best not to trust the CI for runtime benchmarking and to run locally. Also be careful if using the CI exclude any time spent waiting for a machine. |
|
I noticed the if you run the build twice in a row that there is still compilation being done. From Claude: |
Replaces #10809 (and the earlier #9809).
Scope: this PR does one thing — swap the C++ toolchain — plus the minimum fallout fixes for CI to pass. It unblocks a list of cleanups (below) that are deliberately deferred; review this on the toolchain swap alone.
Tip of the hat to @dzbarsky for pointing us at hermetic-llvm: bazel-contrib/toolchains_llvm#795 (comment)
Problem
Prebuilt LLVM release binaries dynamically link
ld.lldagainstlibxml2.so.2. libxml2 2.14 changed its soname tolibxml2.so.16, so on Ubuntu 25.10+/Arch/Fedora 41+ the linker cannot start without host workarounds (libxml2-dev+sudo ln), and with a compatibility symlink every link action printsno version information available(llvm/llvm-project#138225; fixed in LLVM release binaries from 23.1.0, scheduled 2026-08-25).Change
llvm, v0.8.11): statically linked LLVM 22.1.8 binaries and a zero-sysrootcc_toolchain. No host compiler, linker, headers or libraries are involved — the libxml2 problem ceases to exist rather than being worked around, and the toolchain also runs unmodified on non-FHS distros (NixOS).clang-tidynow comes from@llvm//tools:clang-tidy.tcl_langone-line patch:tclZipfs.c's#include "crypt.h"resolved to glibc'scrypt.hinstead of the vendored minizip header once glibc headers became explicit-isystemdirectories. Dropped when a fixedtcl_langlands in BCR.etc/bazel-hermetic: launches bazel with an allowlisted PATH (no host compilers/linkers/interpreters),BAZEL_DO_NOT_DETECT_CPP_TOOLCHAIN=1, and only the workspace bazelrc — builds fail instead of silently reaching for host tools the build should provide. Proxy/cache env passes through; the bazelisk-managed bazel binary is resolved portably (incorporates the review feedback from bazel: stub ld.lld's libxml2.so.2 via toolchains_llvm; add etc/bazel-hermetic #10809). First finding, fixed here: everypy_binaryneeded a hostpython3via rules_python's legacy stub shebang;bootstrap_impl=scriptremoves that.Verification
Ubuntu 26.04, no
libxml2.so.2on the host (the oldsudo lnworkaround deleted),gcc/cc/clang/ld/python3/perl/make/cmakeall absent from the pruned PATH:The binary targets glibc 2.28+, so it remains portable across distros.
Type of Change
Bug fix (build infrastructure)
I have verified that the local build succeeds
My code follows the repository's formatting guidelines
I have signed my commits (DCO).
Out of scope — follow-up cleanup once this merges
Unblocked by this PR, deliberately not in it. Each is its own concern and its own PR:
etc/DependencyInstaller.sh::_install_bazel: the Ubuntu 26.04libxml2-devinstall and thelibxml2.so.16 → libxml2.so.2symlink written into/usr/lib/<arch>-linux-gnu/; droplibxml2from the apt and yum-bazelpackage lists. Audit whetherlibc6-dev/glibc-develandlibtinfo6are still needed under a zero-sysroot toolchain.sudo lnworkaround and installer symlink are obsolete and can be removed from affected systems.sed/ncurses vs glibc 2.43 host headers) no longer reproduces — zero-sysroot removes the failure class — and close it; abandoned PR bazel: fix latent C++ compiler errors surfaced on Ubuntu 26.04 LTS #10523 becomes moot.@llvm-project//openmp:libomp(from source, version-locked to the compiler, per review suggestion). Verified working end to end — builds, links statically,set_thread_countengages — including the one-line dep change it needs in the OpenSTA submodule (src/sta/BUILD), which is why it is a follow-up PR rather than part of this one.tcl_langcrypt.hpatch when a fixed release lands in BCR.toolchains_llvmworkflows, and remove the NixOS "comment out the toolchain" advice (NixOS now works unmodified, per review feedback on this PR).tools/bazelon a runner withoutlibxml2.so.2/compilers so host-tool leaks cannot regress.test/regression_test.shinvokes hostpython3for standalone_python tests and//:dup_id_testshells out to barepython3; once fixed, re-tightentools/bazelwith per-action PATH pruning (backed out for now — it also fragmented action cache keys via the per-run toolbox path).Upstream PRs — after this merges
This PR carries two small local patches instead of waiting on upstream review
cycles. That ordering is deliberate. Master today does not build on hosts
without
libxml2.so.2(Ubuntu 25.10+, Arch, Fedora 41+) except by hand-editing/usr/lib; that is worth days of review, not weeks of upstream latency. Oncemerged and proven stable in daily use, the patches go upstream in the exact
shape usage validated — and if upstream review reshapes them, the follow-up
here adopts what actually lands and drops the local copy.
tcl_lang(BCR): thetclZipfs.ccrypt.h include fix carried here asbazel/tcl-patches/.hermetic-llvm: propose the libc include-precedence change carried here asbazel/hermetic-llvm-patches/(-isystem → -idirafter for kernel/glibc headers; fixes the gnulib/BCR-sed and vendored-header-shadowing class); drop the local patch when released.bazel-orfs: gnumake's repository rule runs GNU Make's configure, which probes PATH for ld; plumb the linker like CC is plumbed so the rule survives pruned-PATH environments (etc/bazel).boost.icl(BCR): the strict-weak-ordering fix carried here asbazel/boost-icl-patches/is already merged upstream (Refactored interval lookup operations boostorg/icl#54); drop the patch when a boost.icl release containing it reaches BCR. Worth a comment on interval_map::operator-= leaves phantom covering intervals boostorg/icl#55 that Macro halo option, unit tests for tapcell #54 resolves the interval_set case under libc++.hermetic-llvm: field report + fixes from this adoption — dangling.tbdsymlinks when an umbrella sub-framework's private-framework target is not in the SDK subset (FontServices case), and the undocumentedPrintCore↔usr/include/cupscoupling.