unix-defects.xhtml (8334B)
- <!DOCTYPE html>
- <html xmlns="http://www.w3.org/1999/xhtml">
- <head>
- <!--#include file="/templates/head.shtml" -->
- <title>Unix defects — lanodan’s cyber-home</title>
- </head>
- <body>
- <!--#include file="/templates/en/nav.shtml" -->
- <main>
- <h1>Unix defects</h1>
- <p>This tries to list all the defects that are present in Unix, an OS from the early 70's. I consider "Unix" what current Unix clones (BSDs, illumos, Linux, …) have implemented.</p>
- <p>None of this should be present in brand new systems except within a cleanly-separated compatibility layer (like Plan9 ape).</p>
- <h3 id="lists"><code>NULL</code>-Terminated lists</h3>
- <dd>
- <dt>Slow to parse</dt><dd>Time taken to obtain the length increases with each <em>byte</em> aka <code role="math">O(n)</code> while length prefix is constant-time aka <code role="math">O(1)</code>.</dd>
- <dt>Inefficient & Unsafe slices</dt><dd>For a slice without modifying the source, you still need to copy the wanted part and terminate it with <code>NULL</code>. While with length prefix you can reuse the source as-is via an offset (or pointer) and setting a different length.</dd>
- <dt>Unsafe</dt><dd>How do you handle <code>NULL</code> being present in the middle of the list? Or <code>NULL</code> being absent?</dd>
- </dd>
- <p>
- And as C doesn't have a specific type for strings (<code>char</code> represents a character in the same way a <a href="https://en.wikipedia.org/wiki/Memory_word">"word" of memory</a> represents some kind of word), the defects applies to all lists.
- This is why most of the C API regarding strings cannot be used safely (<code>strcpy</code> vs <code>strncpy</code> or just <code>memcpy</code>), or why so many third-party C libraries APIs are architecturally broken.
- </p>
- <h3 id="errno"><code>errno</code></h3>
- <p>
- Implementation-defined, had to become stored in thread-local-storage in modern systems so it's not actually a global variable… enjoy.<br />
- It's also a very poor way to handle errors, if you're wondering where the occasional "Error: Success" comes from: This is it.<br />
- And of course, it means having a pretty much static amount of possible errors.
- </p>
- <h3 id="libnss"><code>nsswitch.conf</code>, <code>resolv.conf</code>, …</h3>
- <p>
- Because those configuration files ought to be trully language-independent rather than somewhat stuck to <code>libnss</code> (not Netscape/Mozilla SSL/TLS library) and <code>libresolv</code> by design and prone to creating a lot of problems when used by other programs (such as not dealing correctly with the <code>options</code> of <code>resolv.conf</code>).<br />
- Please consider: Clean ABI; Proper servers; Virtual filesystems (could look like <a href="https://www.openwall.com/tcb/">tcb shadow</a> for <code>passwd</code>).<br />
- See Also: <a href="https://skarnet.org/software/nsss/nsswitch.html">The problem with nsswitch</a> for the security angle.
- </p>
- <p>
- By the way, while (<a href="#getaddrinfo"><code>getaddrinfo(3)</code></a>, <a href="#gethostbyname"><code>gethostbyname(3)</code></a>, …) are part of the POSIX standard, other functions like <code>res_query</code> to actually query DNS records (needed for <code>MX</code>, <code>SRV</code>, …) aren't standardized.
- </p>
- <h3 id="getaddrinfo"><code>getaddrinfo(3)</code></h3>
- <p>Related to <a href="#libnss"><code>nsswitch.conf</code></a>.</p>
- <ul>
- <li>Enjoy most developers having to write code to handle multiple records. Hopefully an unreachable/slow host isn't fatal…</li>
- <li>Cannot handle <a href="https://en.wikipedia.org/wiki/Happy_Eyeballs">Happy Eyeballs</a> (one of the ways to support IPv6)</li>
- <li>Doesn't handles <a href="https://en.wikipedia.org/wiki/SRV_record"><code>SRV</code> records</a>, similarly to how email is using <a href="https://en.wikipedia.org/wiki/MX_record"><code>MX</code> records</a> (not handled either but at least it's a special case).</li>
- </ul>
- <p>Compare this to <a href="http://man.9front.org/2/dial">Plan9 <code>dial(2)</code></a> which also has a nice <code>NetConnInfo</code> structure.</p>
- <h3 id="gethostbyname"><code>gethostbyname(3)</code></h3>
- <p>The older brother of <a href="#getaddrinfo"><code>getaddrinfo(3)</code></a>, it doesn't handles multiple records.</p>
- <h3 id="fs_io">Filesystem I/O</h3>
- <p>
- Removable storage has been a thing on computers since more or less the beginning (punched cards, tape, floppies, CDs, …).
- Buggy storage devices also happen too often to be ignorable.
- Networked filesystems and services exposing a filesystem (Plan9, FUSE) have also been there for a long time.<br />
- Yet somehow, even modern Unixes usually cannot handle them properly, leaving <strong>uninterruptible</strong> processes if they happen to use I/O syscalls on an errorneous target.
- </p>
- <p>Even BSD sockets work better on this front (which is probably why <code>libnfs</code> exists).</p>
- <h3 id="fs_query">Filesystem Queries</h3>
- <p>
- Most network protocols today have the ability to ask the server to search inside some database. Meanwhile Unix filesystems don't even integrate <code>glob</code>, instead this function is stuck to standard libraries. With people relying on third-party I/O-trashing central databases/indexes (again removable/network storage are a thing) from non-standard solutions like <code>locate</code> that are difficult to reuse in other programs.<br />
- Meaning that applications also often roll their own solution.<br />
- Compare this to Haiku
- </p>
- <h3 id="fs_atom">Filesystem lack of transactions</h3>
- <p>
- On Unixes, thanks to the lack of grouping writes into transactions (ie. <code>BEGIN … COMMIT</code> in SQL). The only way to get atomicity is to do manual Copy-on-Write: Copy to a temporary location, write there and then rename to the final destination, meaning you need to have full control over it to avoid race-conditions. And for atomicity over multiple files, a common parent directory is needed, otherwise you're in a bad state between the first rename and the last one.
- </p>
- <p>It also means:
- <ul>
- <li>Can't do safe operating system updates without ignoring the traditional hierarchy or separating in different filesystems (luckily ZFS subvolumes exists)</li>
- <li>Horribly slow, you need to copy the file(s), even with hardlinking the ones you're not writing to, it takes a very long time</li>
- </ul>
- </p>
- <h3 id="devtmpfs"><code>/dev</code> isn't a virtual filesystem</h3>
- <ul>
- <li>Special files present in <code>/dev</code> need to be present anywhere. But it means remote filesystems, removable storage, … can contain device files with different permissions. Don't forget <code>nodev</code> mount option on basically everything.</li>
- <li>You must be root, or on linux have the <code>CAP_MKNOD</code> capability, to create <code>/dev</code> special files. This forces things like creating an initramfs to have root.</li>
- <li>Files present in <code>/dev</code> can be regular files, so you can accidentally end up with <code>/dev/null</code> taking over memory space as various programs write into it.</li>
- </ul>
- <p>Should be noted that Linux's <code>devtmpfs</code> somehow fixed <em>none</em> of the above.</p>
- <p>
- <code>/dev</code> should be a virtual filesystem, say something that points to a device manager like udev or the kernel itself. Similarly to how it's done for <code>/proc</code> and <code>/sys</code> on Linux.
- </p>
- <h3 id="ulimit">ulimit</h3>
- <p>
- It should have been something like cgroups/Plan9-namespaces, instead you get broken-by-default soft limits which can be overridden at any moment by applications, therefore useless. And hard-limits that can realistically only be set per-user/per-group.<br />
- Ever wanted to limit the usage of <em>one</em> software, like say the memory used by the browser? Well you can't with limits, and apparently cgroups have side-effects…
- </p>
- <h2>See Also</h2>
- <ul>
- <li><a href="https://utcc.utoronto.ca/~cks/space/blog/unix/CLibraryAPIRequiresC">The Unix C library API can only be reliably used from C</a></li>
- </ul>
- </main>
- <!--#include file="/templates/en/footer.shtml" -->
- </body>
- </html>