Language-design-considerations.md - drewdevault.com - [mirror] blog and personal website of Drew DeVault

Language-design-considerations.md (7061B)

---
title: Breaking down a small language design proposal
date: 2021-12-30
---
<style>
.redacted { background: black; color: black; }
</style>
We are developing a new systems programming language. The name is a secret, so
we'll call it <span class="redacted">xxxx</span> instead. In <span
class="redacted">xxxx</span>, we have a general requirement that all variables
must be initialized. This is fine for the simple case, such as "let x: int =
10". But, it does not always work well. Let's say that you want to set aside a
large buffer for I/O:
```hare
let x: [1024]int = [0, 0, 0, 0, 0, // ...
```
This can clearly get out of hand. To address this problem, we added the "..."
operator:
```hare
let x: [1024]int = [0...];
let y: *[1024]int = alloc([0...]);
```
This example demonstrates both *stack* allocation of a buffer and *heap*
allocation of a buffer initialized with 1024 zeroes.[^static] This "..."
operator neatly solves our problem. However, another problem occurs to me: what
if you want to allocate a buffer of a variable size?
[^static]: You can also use *static* allocation, which is not shown here.
In addition to arrays, <span class="redacted">xxxx</span> supports slices, which
stores a data pointer, a length, and a capacity. The data pointer refers to an
array whose length is equal to or greater than "capacity", and whose values are
initialized up to "length". We have additional built-ins, "append", "insert",
and "delete", which can dynamically grow and shrink a slice.
```hare
let x: []int = [];
defer free(x);
append(x, 1, 2, 3, 4, 5); // x = [1, 2, 3, 4, 5]
delete(x[..2]);           // x = [3, 4, 5]
insert(x[0], 1, 2);       // x = [1, 2, 3, 4, 5]
```
You can also allocate a slice whose capacity is set to an arbitrary value, but
whose length is only equal to the number of initializers you provide. This is
done through a separate case in the "alloc" grammar:
```hare
use types;
let x: []int = alloc([1, 2, 3], 10);
assert(len(x) == 3);
assert((&x: *types::slice).capacity == 10);
```
This is useful if you know how long the slice will eventually be, so that you
can fill it with "append" without re-allocating (which could be costly
otherwise). However, setting the capacity is not the same thing as setting the
length: all of the items between the length and capacity are uninitialized. How
do we zero-initialize a large buffer in the heap?
Until recently, you simply couldn't. You had to use a rather bad
work-around:
```hare
use rt;
let sz: size = 1024;
let data: *[*]int = rt::malloc(sz * size(int)); // [*] is an array of undefined length
let x: []int = data[..sz];
```
This is obviously not great. We lose type safety, the initialization guarantee,
and bounds checking, and we add a footgun (multiplying by the member type size),
and it's simply not very pleasant to use. To address this, we added the
following syntax:
```hare
let sz: size = 1024;
let x: []int = alloc([0...], sz);
```
Much better! Arriving at this required untangling a lot of other problems that I
haven't mentioned here, but this isn't the design I want to focus on for this
post. Instead, there's a new question this suggests: what about appending a
variable amount of data to a slice? I want to dig into this problem to explore
some of the concerns we think about when working on the language design.
The first idea I came up with was the following:
```hare
let x: []int = [];
append(x, [0...], 10);
```
This would append ten zeroes to "x". This has a problem, though. Consider our
earlier example of "append":
```hare
append(x, 1, 2, 3);
```
The grammar for this looks like the following:[^caveat]
[^caveat]: Disregard the second case of "append-values"; it's not relevant here.
![A screenshot of the language spec showing the grammar for append expressions.](https://redacted.moe/f/d37abdd8.png)
So, the proposed "append(x, \[0...\], 10)" expression is *parsed* like this:
    slice-mutation-expression: append
        object-selector: x
        append-items:
            [0...]
            10
In other words, it looks like "append the values \[0...\] and 10 to x". This
doesn't make sense, but we don't know this until we get to the type checker.
What it really means is "append ten zeroes to x", and we have to identify this
case in the type checker through, essentially, heuristics. Not great! If we dig
deeper into this we find even more edge cases, but I will spare you from the
details.
So, let's consider an alternative design:
```hare
append(x, [1, 2, 3]);   // Previously append(x, 1, 2, 3);
append(x, [0...], 10);  // New feature
```
The grammar for this is much better:
![A screenshot of the revised grammar for this design.](https://redacted.moe/f/b0e56d79.png)
Now we can distinguish between these cases while parsing, so the first example
is parsed as:
    append-expression
        object-selector: x
        expression: [1, 2, 3]   // Items to append
The second is parsed as:
    append-expression
        object-selector: x
        expression: [0...]    // Items to append
        expression: 10        // Desired length
This is a big improvement, but it comes with one annoying problem. The most
common case for append in regular use in <span class="redacted">xxxx</span> is
appending a single item, and this case has worsened thanks to this change:
```hare
append(x, [42]);  // Previously append(x, 42);
```
In fact, appending several items at once is exceptionally uncommon: there are no
examples of it in the standard library. We should try to avoid making the common
case worse for the benefit of the uncommon case.
A pattern we *do* see in the standard library is appending one slice to another,
which is a use-case we've ignored up to this point. This use-case looks
something like the following:
```hare
append(x, y...);
```
Why don't we lean into this a bit more?
```hare
let x: []int = [];
append(x, 42);              // x = [42]
append(x, [1, 2, 3]...);    // x = [42, 1, 2, 3]
append(x, [0...], 6);       // x = [42, 1, 2, 3, 0...]
```
Using the `append(x, y...)` syntax to generally handle appending several items
neatly solves all of our problems. We have arrived at design which:
- Is versatile and utilitarian
- Addresses the most common cases with a comfortable syntax
- Is unambiguous at parse time without type heuristics
I daresay that, in addition to fulfilling the desired new feature, we have
improved the other cases as well. The final grammar for this is the following:
![Formal grammar showing the final state of the design proposal](https://redacted.moe/f/1d400f58.png)
If you're curious to see more, I've extracted the relevant page of the
specification for you to read: [download it here](https://redacted.moe/f/9c48d5d4.pdf). I
hope you found that interesting and insightful!
*Note: Much of these details are subject to change, and we have future
improvements planned which will affect these features &mdash; particularly with
respect to handling allocation failures. Additionally, some of the code samples
were simplified for illustrative purposes.*