commit: 8032ed659cd74af4d9802995c9df444724a2b0b0
parent 980c2b19323ee166cb09b83718df17ba6addcaf7
Author: Drew DeVault <sir@cmpwn.com>
Date: Thu, 30 Dec 2021 14:13:05 +0100
Breaking down a small language design proposal
Diffstat:
1 file changed, 203 insertions(+), 0 deletions(-)
diff --git a/content/blog/Language-design-considerations.md b/content/blog/Language-design-considerations.md
@@ -0,0 +1,203 @@
+---
+title: Breaking down a small language design proposal
+date: 2021-12-30
+---
+
+<style>
+.redacted { background: black; color: black; }
+</style>
+
+We are developing a new systems programming language. The name is a secret, so
+we'll call it <span class="redacted">xxxx</span> instead. In <span
+class="redacted">xxxx</span>, we have a general requirement that all variables
+must be initialized. This is fine for the simple case, such as "let x: int =
+10". But, it does not always work well. Let's say that you want to set aside a
+large buffer for I/O:
+
+```hare
+let x: [1024]int = [0, 0, 0, 0, 0, // ...
+```
+
+This can clearly get out of hand. To address this problem, we added the "..."
+operator:
+
+```hare
+let x: [1024]int = [0...];
+let y: *[1024]int = alloc([0...]);
+```
+
+This example demonstrates both *stack* allocation of a buffer and *heap*
+allocation of a buffer initialized with 1024 zeroes.[^static] This "..."
+operator neatly solves our problem. However, another problem occurs to me: what
+if you want to allocate a buffer of a variable size?
+
+[^static]: You can also use *static* allocation, which is not shown here.
+
+In addition to arrays, <span class="redacted">xxxx</span> supports slices, which
+stores a data pointer, a length, and a capacity. The data pointer refers to an
+array whose length is at equal to or greather than "capacity", and whose values
+are initialized up to "length". We have additional built-ins, "append",
+"insert", and "delete", which can dynamically grow and shrink a slice.
+
+```hare
+let x: []int = [];
+defer free(x);
+append(x, 1, 2, 3, 4, 5); // x = [1, 2, 3, 4, 5]
+delete(x[..2]); // x = [3, 4, 5]
+insert(x[0], 1, 2); // x = [1, 2, 3, 4, 5]
+```
+
+You can also allocate a slice whose capacity is set to an arbitrary value, but
+whose length is only equal to the number of initializers you provide. This is
+done through a separate case in the "alloc" grammar:
+
+```hare
+use types;
+
+let x: []int = alloc([1, 2, 3], 10);
+assert(len(x) == 3);
+assert((&x: *types::slice).capacity == 10);
+```
+
+This is useful if you know how long the slice will eventually be, so that you
+can fill it with "append" without re-allocating (which could be costly
+otherwise). However, setting the capacity is not the same thing as setting the
+length: all of the items between the length and capacity are uninitialized. How
+do we zero-initialize a large buffer in the heap?
+
+Until recently, you simply couldn't. You had to use a rather bad
+work-around:
+
+```hare
+use rt;
+
+let sz: size = 1024;
+let data: *[*]int = rt::malloc(sz * size(int)); // [*] is an array of undefined length
+let x: []int = data[..sz];
+```
+
+This is obviously not great. We lose type safety, the initialization guarantee,
+and bounds checking, and we add a footgun (multiplying by the member type size),
+and it's simply not very pleasant to use. To address this, we added the
+following syntax:
+
+```hare
+let sz: size = 1024;
+let x: []int = alloc([0...], sz);
+```
+
+Much better! Arriving at this required untangling a lot of other problems that I
+haven't mentioned here, but this isn't the design I want to focus on for this
+post. Instead, there's a new question this suggests: what about appending a
+variable amount of data to a slice? I want to dig into this problem to explore
+some of the concerns we think about when working on the language design.
+
+The first idea I came up with was the following:
+
+```hare
+let x: []int = [];
+append(x, [0...], 10);
+```
+
+This would append ten zeroes to "x". This has a problem, though. Consider our
+earlier example of "append":
+
+```hare
+append(x, 1, 2, 3);
+```
+
+The grammar for this looks like the following:[^caveat]
+
+[^caveat]: Disregard the second case of "append-values"; it's not relevant here.
+
+![A screenshot of the language spec showing the grammar for append expressions.](https://l.sr.ht/NinS.png)
+
+So, the proposed "append(x, \[0...\], 10)" expression is *parsed* like this:
+
+ slice-mutation-expression: append
+ object-selector: x
+ append-items:
+ [0...]
+ 10
+
+In other words, it looks like "append the values \[0...\] and 10 to to x". This
+doesn't make sense, but we don't know this until we get to the type checker.
+What it really means is "append ten zeroes to x", and we have to identify this
+case in the type checker through, essentially, heuristics. Not great! If we dig
+deeper into this we find even more edge cases, but I will spare you from the
+details.
+
+So, let's consider an alternative design:
+
+```hare
+append(x, [1, 2, 3]); // Previously append(x, 1, 2, 3);
+append(x, [0...], 10); // New feature
+```
+
+The grammar for this is much better:
+
+![A screenshot of the revised grammar for this design.](https://l.sr.ht/s5qI.png)
+
+Now we can distinguish between these cases while parsing, so the first example
+is parsed as:
+
+ append-expression
+ object-selector: x
+ expression: [1, 2, 3] // Items to append
+
+The second is parsed as:
+
+ append-expression
+ object-selector: x
+ expression: [0...] // Items to append
+ expression: 10 // Desired length
+
+This is a big improvement, but it comes with one annoying problem. The most
+common case for append in regular use in <span class="redacted">xxxx</span> is
+appending a single item, and this case has worsened thanks to this change:
+
+```hare
+append(x, [42]); // Previously append(x, 42);
+```
+
+In fact, appending several items at once is exceptionally uncommon: there are no
+examples of it in the standard library. We should try to avoid making the common
+case worse for the benefit of the uncommon case.
+
+A pattern we *do* see in the standard library is appending one slice to another,
+which is a use-case we've ignored up to this point. This use-case looks
+something like the following:
+
+```hare
+append(x, y...);
+```
+
+Why don't we lean into this a bit more?
+
+```hare
+let x: []int = [];
+append(x, 42); // x = [42]
+append(x, [1, 2, 3]...); // x = [42, 1, 2, 3]
+append(x, [0...], 6); // x = [42, 1, 2, 3, 0...]
+```
+
+Using the `append(x, y...)` syntax to generally handle appending several items
+neatly solves all of our problems. We have arrived at design which:
+
+- Is versatile and utilitarian
+- Addresses the most common cases with a comfortable syntax
+- Is unambiguous at parse time without type heuristics
+
+I daresay that, in addition to fulfilling the desired new feature, we have
+improved the other cases as well. The final grammar for this is the following:
+
+![Formal grammar showing the final state of the design proposal](https://l.sr.ht/2nF0.png)
+
+If you're curious to see more, I've extracted the relevant page of the
+specification for you to read: [download it here](https://l.sr.ht/eeta.pdf). I
+hope you found that interesting and insightful!
+
+*Note: Much of these details are subject to change, and we have future
+improvements planned which will affect these features — particularly with
+respect to handling allocation failures. Additionally, some of the code samples
+were simplified for illustrative purposes.*