Bootstrapping
- Bootstrappable Builds (GNU Guix focus)
- bootstrapping wiki
- David A. Wheeler’s Page on Fully Countering Trusting Trust through Diverse Double-Compiling (DDC) - Countering Trojan Horse attacks on Compilers (Note: Requires trustworthy bootstrap compiler(s) as starting point)
- Mike Stute's answer to What is a coder's worst nightmare?
Reasons
- Security
- See the "Backdoors inserted into dev tools" section
- Portability
- Binary executables have much higher bitrot than source code and keeping obsolete binary interfaces often means keeping known security issues.
- Maintainability
- By making sure someone else can actually continue maintaining the software, canonical versions or forks
- Reproducibility's other side of the coin
- One of reproducibility's effect is allowing to audit source code instead of binaries, but said source code needs to be actually used.
Backdoors inserted into dev tools
This is by no means an exhaustive list, mostly because it happens way too regularly on npm.
Ken Thompson “Reflections on Trusting Trust” Compiler
In Chronological order:
- 1983-10: Reflections on Trusting Trust, Ken Thompson
- 2022-07-27: Ken Thompson Really Did Launch His "Trusting Trust" Trojan Attack in Real Life
- 2023-03-12: Q&A section after Ken Thompson talk on keeping his music collection (Talk page)
- 2023-10-25: research!rsc: Running the “Reflections on Trusting Trust” Compiler: This notably contains the code that Ken Thomspon used together with explainations
CVE-2024-3094: Jia Tan backdoor in xz-utils
This one is noteworthy for primarily being an insertion of a payload
into a ./configure
script generated by autotools rather
than a binary; being a near-successful attack on OpenSSH when patched
to link with systemd-journald; having been detected pretty much
by a combination of sheer curiosity and lucky pre-existing benchmark.
- Discovery by Andres Freund
- https://tukaani.org/xz-backdoor/
- Detailed FAQ on the xz-utils backdoor by Sam James
Proof of Concepts
- Reflections on Rusting Trust: Backdooring The One True Rust Compiler.
Definitions
- Source Code
-
As shared by GPLv3, the Free Software Definition, and the Open-Source Definition. Each with little differences, reproduced the latter below as it explictly exclude obfuscation and codegen:
[The] Source code must be the preferred form in which a programmer would modify the program. Deliberately obfuscated source code is not allowed. Intermediate forms such as the output of a preprocessor or translator are not allowed.
Tools
- deblob
- Remove known binary executable formats (including bytecode), designed to be fast enough to barely impact distro-scale package building performance, cannot detect all blobs
- Debian's suspicious-source script
- Lists what isn't present in a list of source code formats, good for manual audits. Python+
magic(5)
means it is quite slow.
Disclaimers
Unless you plan to make a system language, bootstrapping doesn't have to be done from C or a low-level language. It's fine to use any language as long at it's also bootstrappable and wouldn't introduce circular dependencies.
For example: Go and Lua are fine. Perl and Python are a bit involved but live-bootstrap got both. Any language supported by GCC is also okay, although a simpler backend such as QBE could be more interesting.
Non-Problematic / Praise
Go
Installing Go from source in the official Go documentation details it, both GCCGO and a branching out of Go 1.4 are supported.
More recent versions do require a bootstrap chain but as Go's own toolchain is standalone it's fine.
Historically problematic
Firefox >=68 <=78
Firefox would bundle python2 and refuse to build if removed. See Debian firefox-esr source history
Potentially problematic
OCaml
Has binary seeds in ./boot
, there is camlboot but it seems to be pretty inefficient (takes hours to compile when regular ocaml takes minutes to compile)
Problematic software
Zig
Threw out the C++ implementation in favor of a large WASM binary seed in 0.10.0, for now it's chained-bootstrapping. Hopefully an alternative compiler written in a bootstrapped language will appear, because keeping versions of LLVM all the way to 13 working properly like Guix does just doesn't seems reasonable.
Zig Reproduced Without Binaries (related debbugs.gnu.org entry): Successfully reproducing Zig binaries within Guix, sadly inpractical as it uses 53+ intermediate versions between 0.9.1-ish (inclusive) and 0.13.
Erlang
Documented as originally implemented in prolog, now version n requires binaries version n-1 or n to build. No alternative compiler known so far.
Rust
There is mrustc (packaged in Guix and Gentoo) but it tends to lag behind by about ten 1.x versions, which sadly you each need to compile as intermediary steps. Rustc also vendors several other projects like LLVM and rust crates (enjoy non-installable libraries), similarly to other rust software.
GCC Rust Frontend is also not ready yet (2023-03) for userland, as cargo doesn't bootstraps…
Cargo
As if rustc being a bootstrapping problem wouldn't be enough, cargo,
the buildsystem+dependency-installer for Rust software depends on
~60 direct libraries,
notably including 2+ git libraries, HTTP Authentication, and OpenSSL.
Cargo isn't a buildsystem, it's a full blown package manager and a troublemaker when it comes to dependency management due to designed-vulnerable crates.io.
It really ought to be replaced by something which only
takes care of building code (or even just generating
a Makefile
or a build.ninja
file),
as was done in the C ecosystem many times in the past
(pkg-config ⇒ pkgconf,
ninja ⇒ samurai,
…).
This isn't a system that scales, this is just creating a gigantic blob
of software that cannot be reasonably audited, right in the toolchain.
Java
Requires compilers abandonned ~10 years ago, currently doesn't builds to OpenJDK for me.
Free-Pascal Compiler / Object Pascal
Aesop seems to still be at the vaporware stage, no code is available.
Nim
The transpiled C non-source code used for bootstrapping contained in ./c_code/
is pretty much what you would get with C++ mangled symbols auto-decompiled to C.
Bootstrapping Nim via historical releases would need a bootstrap path for Object Pascal, which doesn't exists (yet?), another way would be to have a minimal Nim compiler written in another language which is capable of compiling the current compiler.
QEMU
QEMU 7.0 needs a quick fix on the pc-bios/meson.build
file so you can choose to not use the binaries it ships, fixed in QEMU 7.1.
They are still required so it means identifying the source of all of them and having proper from-source packaging, it's already done in gentoo for Seabios and EDK2-OVMF (UEFI) which is enough to boot machines but not for full-x86 support, non-x86 being even more problematic (ie. which upstream is used for OpenBIOS/OpenFirmware as used for sparc32, sparc64 and ppc32).
wine-mono
In gentoo it's a collection of binaries. Upstream repository is at https://github.com/madewokherd/wine-mono but still includes binaries…
mono / .NET
Source-only building is unsupported and nearly impossible (massive chain + intermediary unstable versions).
Should also be noted that Mono started itself with the Microsoft C# compiler (History | Mono) instead of dotGNU (which is dead since 2012).
2024-11-30 Update: unmush managed to build mono all the way to 6.12.0 on Guix: [PATCH] Adding a fully-bootstrapped mono
- potential prebuilt binaries in the Mono package
- Cannot build without binary-reference-assemblies
- Full source bootstrap · Issue #1930 · dotnet/source-build
- mono-6.12.0.199_deblob.log, mono-6.12.0.199_deblob.json: Automatically generated list of blobs via deblob on mono-6.12.0.199.tar.xz
Chez Scheme
Requires bootstrap files, GNU GuixSD packaging doesn't seems to have it figured out yet.
NekoVM
Doesn't seems possible to build without boot/*.n
files being present, which are NekoVM bytecode files.
Not Quite Perl (NQP)
Doesn't seems possible to build without src/vm/moar/stage0/*.moarvm
files being present, which are MoarVM bytecode files.
This means no Rakudo/Perl6.
GNU gnulib
lib/javaversion.class
. Made [PATCH] lib/javaversion.class: Remove, build from source to have it built from source.
GNU gettext
gnulib java blob; 3 Java class files in gettext-tools/examples
; gettext-tools/m4/csharpexec-test.exe
which doesn't have source code (C# is effectively proprietary anyway). Did sys-devel/gettext: deblob to fix it.
TypeScript
Compiler itself is written in TypeScript, no bootstrap path possible as the commit introducing the compiler is TypeScript code. Want TypeScript compiler? Get a blob from npmjs.org
, like the Initial commit tells you.
Alternative might be swc (Rust). Note that Deno (also Rust) just grabs pre-transpiled JS from Microsoft and Babel simply seems to depend on the typescript
package.
And it should be noted that TypeScript seems to have no specification anymore. (Commit: Remove doc folder (old archived spec and assets), word2md script)
2025-03 Update: TypeScript is porting it's compiler to Go, for performance reasons but should also mean getting bootstrappability, hopefully full-source kind
Dart
Yet another chicken-egg language without a single documented way to bootstrap it from source, I wish they would have learned from the other language from Google: Go.
rollup
- chicken-egg
- Uses rollup to build itself
- one-step circular dependency
- rollup → acorn → rollup
- links to a two-step circular dependency
- rollup → eslint → webpack → acorn → eslint
Note: acorn doesn't lists it's dependencies on npmjs because it publishes a pre-compiled version…