logo

blog

My website can't be that messy, right? git clone https://anongit.hacktivis.me/git/blog.git/

bootstrapping.shtml (16188B)


  1. <!DOCTYPE html>
  2. <html xmlns="http://www.w3.org/1999/xhtml">
  3. <head>
  4. <!--#include file="/templates/head.shtml" -->
  5. <title>Bootstrapping — lanodan’s cyber-home</title>
  6. </head>
  7. <body>
  8. <!--#include file="/templates/en/nav.shtml" -->
  9. <main class="section-count">
  10. <h1>Bootstrapping</h1>
  11. <ul>
  12. <li><a href="https://bootstrappable.org/">Bootstrappable Builds</a> (GNU Guix focus)</li>
  13. <li><a href="https://bootstrapping.miraheze.org/wiki/Main_Page">bootstrapping wiki</a></li>
  14. <li><a href="https://dwheeler.com/trusting-trust/">David A. Wheeler’s Page on Fully Countering Trusting Trust through Diverse Double-Compiling (DDC) - Countering Trojan Horse attacks on Compilers</a> (Note: Requires trustworthy bootstrap compiler(s) as starting point)</li>
  15. <li><a href="https://www.quora.com/What-is-a-coders-worst-nightmare/answer/Mick-Stute?srid=tQ46&amp;share=1">Mike Stute's answer to What is a coder's worst nightmare?</a></li>
  16. </ul>
  17. <h2>Reasons</h2>
  18. <dl>
  19. <dt>Security</dt>
  20. <dd>See the "<a href="#devtools-backdoors">Backdoors inserted into dev tools</a>" section</dd>
  21. <dt>Portability</dt>
  22. <dd>Binary executables have much higher <a href="https://en.wikipedia.org/wiki/Software_rot">bitrot</a> than source code and keeping obsolete binary interfaces often means keeping known security issues.</dd>
  23. <dt>Maintainability</dt>
  24. <dd>By making sure someone else can actually continue maintaining the software, canonical versions or forks</dd>
  25. <dt>Reproducibility's other side of the coin</dt>
  26. <dd>One of <a href="https://reproducible-builds.org/">reproducibility</a>'s effect is allowing to audit source code instead of binaries, but said source code needs to be actually used.</dd>
  27. </dl>
  28. <h2 id="devtools-backdoors">Backdoors inserted into dev tools</h2>
  29. <p>This is by no means an exhaustive list, mostly because it happens way too regularly on npm.</p>
  30. <h3>Ken Thompson “Reflections on Trusting Trust” Compiler</h3>
  31. <p>In Chronological order:</p>
  32. <ol>
  33. <li>1983-10: <a href="https://dl.acm.org/doi/10.1145/358198.358210">Reflections on Trusting Trust, Ken Thompson</a></li>
  34. <li>2022-07-27: <a href="https://niconiconi.neocities.org/posts/ken-thompson-really-did-launch-his-trusting-trust-trojan-attack-in-real-life/">Ken Thompson Really Did Launch His "Trusting Trust" Trojan Attack in Real Life</a></li>
  35. <li>2023-03-12: <a href="https://www.youtube.com/live/kaandEt_pKw?t=3284">Q&A section after Ken Thompson talk on keeping his music collection</a> (<a href="https://www.socallinuxexpo.org/scale/20x/presentations/keynote-ken-thompson">Talk page</a>)</li>
  36. <li>2023-10-25: <a href="https://research.swtch.com/nih">research!rsc: Running the “Reflections on Trusting Trust” Compiler</a>: This notably contains the code that Ken Thomspon used together with explainations</li>
  37. </ol>
  38. <h3>CVE-2024-3094: Jia Tan backdoor in xz-utils</h3>
  39. <p>
  40. This one is noteworthy for primarily being an insertion of a payload
  41. into a <code>./configure</code> script generated by autotools rather
  42. than a binary; being a near-successful attack on OpenSSH when patched
  43. to link with systemd-journald; having been detected pretty much
  44. by a combination of sheer curiosity and lucky pre-existing benchmark.
  45. </p>
  46. <ol>
  47. <li><a href="https://www.openwall.com/lists/oss-security/2024/03/29/4">Discovery by Andres Freund</a></li>
  48. <li><a href="https://tukaani.org/xz-backdoor/">https://tukaani.org/xz-backdoor/</a></li>
  49. <li><a href="https://gist.github.com/thesamesam/223949d5a074ebc3dce9ee78baad9e27">Detailed FAQ on the xz-utils backdoor</a> by Sam James</li>
  50. </ol>
  51. <h3>Proof of Concepts</h3>
  52. <ul>
  53. <li><a href="https://manishearth.github.io/blog/2016/12/02/reflections-on-rusting-trust/">Reflections on Rusting Trust</a>: Backdooring The One True Rust Compiler.</li>
  54. </ul>
  55. <h2>Definitions</h2>
  56. <dl>
  57. <dt>Source Code</dt>
  58. <dd>
  59. <p>
  60. As shared by GPLv3,
  61. the <a href="https://www.gnu.org/philosophy/free-sw.html#make-changes">Free Software Definition</a>,
  62. and the <a href="https://opensource.org/osd">Open-Source Definition</a>.
  63. Each with little differences, reproduced the latter below as it explictly exclude
  64. obfuscation and codegen:
  65. </p>
  66. <blockquote>
  67. [The] Source code must be the preferred form in which a programmer would
  68. modify the program. Deliberately obfuscated source code is not allowed.
  69. Intermediate forms such as the output of a preprocessor or translator
  70. are not allowed.
  71. </blockquote>
  72. </dd>
  73. </dl>
  74. <h2 id="tools">Tools</h2>
  75. <dl>
  76. <dt><a href="https://hacktivis.me/projects/deblob">deblob</a></dt>
  77. <dd>Remove known binary executable formats (including bytecode), designed to be fast enough to barely impact distro-scale package building performance, cannot detect all blobs</dd>
  78. <dt>Debian's <a href="https://salsa.debian.org/debian/devscripts/-/blob/master/scripts/suspicious-source">suspicious-source</a> script</dt>
  79. <dd>Lists what isn't present in a list of source code formats, good for manual audits. Python+<code>magic(5)</code> means it is quite slow.</dd>
  80. </dl>
  81. <h2>Disclaimers</h2>
  82. <p>
  83. Unless you plan to make a system language, bootstrapping doesn't
  84. have to be done from C or a low-level language.
  85. It's fine to use any language as long at it's also bootstrappable
  86. and wouldn't introduce circular dependencies.
  87. </p>
  88. <p>
  89. For example: Go and Lua are fine.
  90. Perl and Python are a bit involved but live-bootstrap got both.
  91. Any language supported by GCC is also okay, although a simpler backend such as
  92. <a href="https://c9x.me/compile/">QBE</a>
  93. could be more interesting.
  94. </p>
  95. <h2>Non-Problematic / Praise</h2>
  96. <h3 id="go">Go</h3>
  97. <p>
  98. <a href="https://golang.org/doc/install/source">Installing Go from source</a>
  99. in the official Go documentation details it, both GCCGO and a branching
  100. out of Go 1.4 are supported.
  101. </p>
  102. <p>
  103. More recent versions do require a bootstrap chain but as Go's own
  104. toolchain is standalone it's fine.
  105. </p>
  106. <h2>Historically problematic</h2>
  107. <h3 id="firefox_python2">Firefox &gt;=68 &lt;=78</h3>
  108. <p>Firefox would bundle python2 and refuse to build if removed. See <a href="https://salsa.debian.org/mozilla-team/firefox/-/commits/esr78/master/obj-x86_64-pc-linux-gnu/_virtualenvs/init/bin">Debian firefox-esr source history</a></p>
  109. <h2>Potentially problematic</h2>
  110. <h3>OCaml</h3>
  111. <p>Has binary seeds in <code>./boot</code>, there is <a href="https://github.com/Ekdohibs/camlboot">camlboot</a> but it seems to be pretty inefficient (takes hours to compile when regular ocaml takes minutes to compile)</p>
  112. <h2>Problematic software</h2>
  113. <h3 id="zig">Zig</h3>
  114. <p>
  115. <a href="https://ziglang.org/news/goodbye-cpp/">Threw out the C++ implementation in favor of a <strong>large</strong> WASM binary seed</a> in 0.10.0,
  116. for now it's chained-bootstrapping.
  117. Hopefully an alternative compiler written in a bootstrapped language
  118. will appear, because keeping versions of LLVM all the way to 13
  119. working properly like Guix does just doesn't seems reasonable.
  120. </p>
  121. <p>
  122. <a href="https://jakstys.lt/2024/zig-reproduced-without-binaries/">Zig Reproduced Without Binaries</a>
  123. (<a href="https://debbugs.gnu.org/cgi/bugreport.cgi?bug=74217">related debbugs.gnu.org entry</a>):
  124. Successfully reproducing Zig binaries within Guix, sadly inpractical as it
  125. uses 53+ intermediate versions between 0.9.1-ish (inclusive) and 0.13.
  126. </p>
  127. <h3 id="erlang">Erlang</h3>
  128. <p>Documented as originally implemented in prolog, now version <i class="math">n</i> requires binaries version <i class="math">n-1</i> or <i class="math">n</i> to build. No alternative compiler known so far.</p>
  129. <h3 id="rust">Rust</h3>
  130. <p>
  131. There is <a href="https://github.com/thepowersgang/mrustc">mrustc</a>
  132. (packaged in Guix and Gentoo)
  133. but it tends to lag behind by about ten 1.x versions,
  134. which sadly you each need to compile as intermediary steps.
  135. Rustc also vendors several other projects like LLVM and rust crates
  136. (enjoy non-installable libraries), similarly to other rust software.
  137. </p>
  138. <p>
  139. GCC Rust Frontend is also not ready yet (2023-03) for userland,
  140. as <a href="#cargo">cargo</a> doesn't bootstraps…
  141. </p>
  142. <h3 id="cargo">Cargo</h3>
  143. <p>
  144. As if rustc being a bootstrapping problem wouldn't be enough, cargo,
  145. the buildsystem+dependency-installer for Rust software depends on
  146. <a href="https://github.com/rust-lang/cargo/blob/master/Cargo.toml">~60 direct libraries</a>,
  147. notably including 2+ git libraries, HTTP Authentication, and OpenSSL.<br />
  148. </p>
  149. <p>
  150. Cargo isn't a buildsystem, it's a full blown package manager
  151. and a troublemaker when it comes to dependency management due to
  152. <a href="https://drewdevault.com/2022/05/12/Supply-chain-when-will-we-learn.html">designed-vulnerable crates.io</a>.
  153. </p>
  154. <p>
  155. It really ought to be replaced by something which only
  156. takes care of building code (or even just generating
  157. a <code>Makefile</code> or a <code>build.ninja</code> file),
  158. as was done in the C ecosystem many times in the past
  159. (pkg-config ⇒ <a href="https://gitea.treehouse.systems/ariadne/pkgconf">pkgconf</a>,
  160. ninja ⇒ <a href="https://github.com/michaelforney/samurai">samurai</a>,
  161. …).<br />
  162. This isn't a system that scales, this is just creating a gigantic blob
  163. of software that cannot be reasonably audited, right in the toolchain.
  164. </p>
  165. <h3 id="java">Java</h3>
  166. <p>Requires compilers abandonned ~10 years ago, currently doesn't builds to OpenJDK for me.</p>
  167. <h3>Free-Pascal Compiler / Object Pascal</h3>
  168. <p><a href="https://bootstrapping.miraheze.org/wiki/Aesop">Aesop</a> seems to still be at the vaporware stage, no code is available.</p>
  169. <h3 id="nim">Nim</h3>
  170. <p>
  171. The transpiled C non-source code used for bootstrapping contained in <code>./c_code/</code> is pretty much what you would get with C++ mangled symbols auto-decompiled to C.<br />
  172. <a href="https://bootstrapping.miraheze.org/wiki/Bootstrapping_Nim">Bootstrapping Nim via historical releases</a> would need a bootstrap path for Object Pascal, which doesn't exists (yet?), another way would be to have a minimal Nim compiler written in another language which is capable of compiling the current compiler.
  173. </p>
  174. <h3 id="qemu">QEMU</h3>
  175. <p>
  176. QEMU 7.0 <a href="https://github.com/gentoo/gentoo/commit/11c7bca43160b3d893dc8d846d8da2838332123c">needs a quick fix on the <code>pc-bios/meson.build</code> file</a> so you can choose to not use the binaries it ships, fixed in QEMU 7.1.<br />
  177. They are still required so it means identifying the source of all of them and having proper from-source packaging, it's already done in gentoo for Seabios and EDK2-OVMF (UEFI) which is enough to boot machines but not for full-x86 support, non-x86 being even more problematic (ie. which upstream is used for OpenBIOS/OpenFirmware as used for sparc32, sparc64 and ppc32).
  178. </p>
  179. <h3 id="wine-mono">wine-mono</h3>
  180. <p>In gentoo it's a collection of binaries. Upstream repository is at <a href="https://github.com/madewokherd/wine-mono">https://github.com/madewokherd/wine-mono</a> but still includes binaries…</p>
  181. <h3 id="mono">mono / .NET</h3>
  182. <p>
  183. Source-only building is unsupported and nearly impossible (massive chain + intermediary unstable versions).<br />
  184. Should also be noted that Mono started itself with the Microsoft C# compiler (<a href="https://www.mono-project.com/docs/about-mono/history/">History | Mono</a>) instead of <a href="https://www.gnu.org/software/dotgnu/">dotGNU</a> (which is dead since 2012).
  185. </p>
  186. <p>2024-11-30 Update: unmush managed to build mono all the way to 6.12.0 on Guix: <a href="https://debbugs.gnu.org/cgi/bugreport.cgi?bug=74609">[PATCH] Adding a fully-bootstrapped mono</a></p>
  187. <ul>
  188. <li><a href="https://issues.guix.gnu.org/55026">potential prebuilt binaries in the Mono package</a></li>
  189. <li><a href="https://github.com/mono/mono/issues/7445">Cannot build without binary-reference-assemblies</a></li>
  190. <li><a href="https://github.com/dotnet/source-build/issues/1930">Full source bootstrap · Issue #1930 · dotnet/source-build</a></li>
  191. <li><a href="/notes/mono-6.12.0.199_deblob.log">mono-6.12.0.199_deblob.log</a>, <a href="/notes/mono-6.12.0.199_deblob.json">mono-6.12.0.199_deblob.json</a>: Automatically generated list of blobs via deblob on mono-6.12.0.199.tar.xz</li>
  192. </ul>
  193. <h3 id="chez">Chez Scheme</h3>
  194. <p>Requires bootstrap files, <a href="https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/chez.scm">GNU GuixSD packaging</a> doesn't seems to have it figured out yet.</p>
  195. <h3 id="neko">NekoVM</h3>
  196. <p>Doesn't seems possible to build without <code>boot/*.n</code> files being present, which are NekoVM bytecode files.</p>
  197. <h3 id="nqp">Not Quite Perl (NQP)</h3>
  198. <p>
  199. Doesn't seems possible to build without <code>src/vm/moar/stage0/*.moarvm</code> files being present, which are MoarVM bytecode files.
  200. This means no Rakudo/Perl6.
  201. </p>
  202. <h3 id="gnulib">GNU gnulib</h3>
  203. <p><code>lib/javaversion.class</code>. Made <a href="https://hacktivis.me/tmp/0001-lib-javaversion.class-Remove-build-from-source.patch">[PATCH] lib/javaversion.class: Remove, build from source</a> to have it built from source.</p>
  204. <h3 id="gettext">GNU gettext</h3>
  205. <p>gnulib java blob; 3 Java class files in <code>gettext-tools/examples</code>; <code>gettext-tools/m4/csharpexec-test.exe</code> which doesn't have source code (C# is effectively proprietary anyway). Did <a href="https://github.com/gentoo/gentoo/commit/54b36e80f7c3910ae1557c2faafda3d6d62daf49">sys-devel/gettext: deblob</a> to fix it.</p>
  206. <h3 id="typescript">TypeScript</h3>
  207. <p>Compiler itself is written in TypeScript, no bootstrap path possible as the <a href="https://github.com/microsoft/TypeScript/commit/214df64e287804577afa1fea0184c18c40f7d1ca">commit introducing the compiler</a> is TypeScript code. Want TypeScript compiler? Get a blob from <code>npmjs.org</code>, like the <a href="https://github.com/microsoft/TypeScript/commit/99ec3a96880649eeaa08c3df30e3ae802048f4fe">Initial commit</a> tells you.</p>
  208. <p>
  209. Alternative might be <a href="https://github.com/swc-project/swc">swc</a> (<a href="#rust">Rust</a>). Note that <a href="https://deno.land/">Deno</a> (also <a href="#rust">Rust</a>) just <a href="https://github.com/denoland/deno/blob/main/tools/update_typescript.md">grabs pre-transpiled JS from Microsoft</a> and <a href="https://babeljs.io/">Babel</a> simply seems to depend on the <code>typescript</code> package.<br />
  210. And it should be noted that TypeScript seems to have no specification anymore. (Commit: <a href="https://github.com/microsoft/TypeScript/commit/91822db8e01e38e1f9d80142df67d3849851571d">Remove doc folder (old archived spec and assets), word2md script</a>)
  211. </p>
  212. <p>2025-03 Update: <a href="https://github.com/microsoft/typescript-go">TypeScript is porting it's compiler to Go</a>, for performance reasons but should also mean getting bootstrappability, hopefully full-source kind</p>
  213. <h3 id="dart">Dart</h3>
  214. <p>
  215. Yet another chicken-egg language without a single documented way to bootstrap it from source, I wish they would have learned from the other language from Google: Go.
  216. </p>
  217. <h3 id="rollup">rollup</h3>
  218. <p>
  219. <dl>
  220. <dt>chicken-egg</dt><dd>Uses rollup to build itself</dd>
  221. <dt>one-step circular dependency</dt><dd>rollup → acorn → rollup</dd>
  222. <dt>links to a two-step circular dependency</dt><dd>rollup → eslint → webpack → acorn → eslint</dd>
  223. </dl>
  224. I guess web development can also mean creating cyclic graphs of dependencies.<br />
  225. Note: acorn doesn't lists it's dependencies on npmjs because it publishes a pre-compiled version…
  226. </p>
  227. </main>
  228. <!--#include file="/templates/en/footer.shtml" -->
  229. </body>
  230. </html>