logo

drewdevault.com

[mirror] blog and personal website of Drew DeVault git clone https://hacktivis.me/git/mirror/drewdevault.com.git
commit: 8032ed659cd74af4d9802995c9df444724a2b0b0
parent 980c2b19323ee166cb09b83718df17ba6addcaf7
Author: Drew DeVault <sir@cmpwn.com>
Date:   Thu, 30 Dec 2021 14:13:05 +0100

Breaking down a small language design proposal

Diffstat:

Acontent/blog/Language-design-considerations.md203+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 203 insertions(+), 0 deletions(-)

diff --git a/content/blog/Language-design-considerations.md b/content/blog/Language-design-considerations.md @@ -0,0 +1,203 @@ +--- +title: Breaking down a small language design proposal +date: 2021-12-30 +--- + +<style> +.redacted { background: black; color: black; } +</style> + +We are developing a new systems programming language. The name is a secret, so +we'll call it <span class="redacted">xxxx</span> instead. In <span +class="redacted">xxxx</span>, we have a general requirement that all variables +must be initialized. This is fine for the simple case, such as "let x: int = +10". But, it does not always work well. Let's say that you want to set aside a +large buffer for I/O: + +```hare +let x: [1024]int = [0, 0, 0, 0, 0, // ... +``` + +This can clearly get out of hand. To address this problem, we added the "..." +operator: + +```hare +let x: [1024]int = [0...]; +let y: *[1024]int = alloc([0...]); +``` + +This example demonstrates both *stack* allocation of a buffer and *heap* +allocation of a buffer initialized with 1024 zeroes.[^static] This "..." +operator neatly solves our problem. However, another problem occurs to me: what +if you want to allocate a buffer of a variable size? + +[^static]: You can also use *static* allocation, which is not shown here. + +In addition to arrays, <span class="redacted">xxxx</span> supports slices, which +stores a data pointer, a length, and a capacity. The data pointer refers to an +array whose length is at equal to or greather than "capacity", and whose values +are initialized up to "length". We have additional built-ins, "append", +"insert", and "delete", which can dynamically grow and shrink a slice. + +```hare +let x: []int = []; +defer free(x); +append(x, 1, 2, 3, 4, 5); // x = [1, 2, 3, 4, 5] +delete(x[..2]); // x = [3, 4, 5] +insert(x[0], 1, 2); // x = [1, 2, 3, 4, 5] +``` + +You can also allocate a slice whose capacity is set to an arbitrary value, but +whose length is only equal to the number of initializers you provide. This is +done through a separate case in the "alloc" grammar: + +```hare +use types; + +let x: []int = alloc([1, 2, 3], 10); +assert(len(x) == 3); +assert((&x: *types::slice).capacity == 10); +``` + +This is useful if you know how long the slice will eventually be, so that you +can fill it with "append" without re-allocating (which could be costly +otherwise). However, setting the capacity is not the same thing as setting the +length: all of the items between the length and capacity are uninitialized. How +do we zero-initialize a large buffer in the heap? + +Until recently, you simply couldn't. You had to use a rather bad +work-around: + +```hare +use rt; + +let sz: size = 1024; +let data: *[*]int = rt::malloc(sz * size(int)); // [*] is an array of undefined length +let x: []int = data[..sz]; +``` + +This is obviously not great. We lose type safety, the initialization guarantee, +and bounds checking, and we add a footgun (multiplying by the member type size), +and it's simply not very pleasant to use. To address this, we added the +following syntax: + +```hare +let sz: size = 1024; +let x: []int = alloc([0...], sz); +``` + +Much better! Arriving at this required untangling a lot of other problems that I +haven't mentioned here, but this isn't the design I want to focus on for this +post. Instead, there's a new question this suggests: what about appending a +variable amount of data to a slice? I want to dig into this problem to explore +some of the concerns we think about when working on the language design. + +The first idea I came up with was the following: + +```hare +let x: []int = []; +append(x, [0...], 10); +``` + +This would append ten zeroes to "x". This has a problem, though. Consider our +earlier example of "append": + +```hare +append(x, 1, 2, 3); +``` + +The grammar for this looks like the following:[^caveat] + +[^caveat]: Disregard the second case of "append-values"; it's not relevant here. + +![A screenshot of the language spec showing the grammar for append expressions.](https://l.sr.ht/NinS.png) + +So, the proposed "append(x, \[0...\], 10)" expression is *parsed* like this: + + slice-mutation-expression: append + object-selector: x + append-items: + [0...] + 10 + +In other words, it looks like "append the values \[0...\] and 10 to to x". This +doesn't make sense, but we don't know this until we get to the type checker. +What it really means is "append ten zeroes to x", and we have to identify this +case in the type checker through, essentially, heuristics. Not great! If we dig +deeper into this we find even more edge cases, but I will spare you from the +details. + +So, let's consider an alternative design: + +```hare +append(x, [1, 2, 3]); // Previously append(x, 1, 2, 3); +append(x, [0...], 10); // New feature +``` + +The grammar for this is much better: + +![A screenshot of the revised grammar for this design.](https://l.sr.ht/s5qI.png) + +Now we can distinguish between these cases while parsing, so the first example +is parsed as: + + append-expression + object-selector: x + expression: [1, 2, 3] // Items to append + +The second is parsed as: + + append-expression + object-selector: x + expression: [0...] // Items to append + expression: 10 // Desired length + +This is a big improvement, but it comes with one annoying problem. The most +common case for append in regular use in <span class="redacted">xxxx</span> is +appending a single item, and this case has worsened thanks to this change: + +```hare +append(x, [42]); // Previously append(x, 42); +``` + +In fact, appending several items at once is exceptionally uncommon: there are no +examples of it in the standard library. We should try to avoid making the common +case worse for the benefit of the uncommon case. + +A pattern we *do* see in the standard library is appending one slice to another, +which is a use-case we've ignored up to this point. This use-case looks +something like the following: + +```hare +append(x, y...); +``` + +Why don't we lean into this a bit more? + +```hare +let x: []int = []; +append(x, 42); // x = [42] +append(x, [1, 2, 3]...); // x = [42, 1, 2, 3] +append(x, [0...], 6); // x = [42, 1, 2, 3, 0...] +``` + +Using the `append(x, y...)` syntax to generally handle appending several items +neatly solves all of our problems. We have arrived at design which: + +- Is versatile and utilitarian +- Addresses the most common cases with a comfortable syntax +- Is unambiguous at parse time without type heuristics + +I daresay that, in addition to fulfilling the desired new feature, we have +improved the other cases as well. The final grammar for this is the following: + +![Formal grammar showing the final state of the design proposal](https://l.sr.ht/2nF0.png) + +If you're curious to see more, I've extracted the relevant page of the +specification for you to read: [download it here](https://l.sr.ht/eeta.pdf). I +hope you found that interesting and insightful! + +*Note: Much of these details are subject to change, and we have future +improvements planned which will affect these features &mdash; particularly with +respect to handling allocation failures. Additionally, some of the code samples +were simplified for illustrative purposes.*