logo

drewdevault.com

[mirror] blog and personal website of Drew DeVault git clone https://hacktivis.me/git/mirror/drewdevault.com.git

Language-design-considerations.md (7017B)


  1. ---
  2. title: Breaking down a small language design proposal
  3. date: 2021-12-30
  4. ---
  5. <style>
  6. .redacted { background: black; color: black; }
  7. </style>
  8. We are developing a new systems programming language. The name is a secret, so
  9. we'll call it <span class="redacted">xxxx</span> instead. In <span
  10. class="redacted">xxxx</span>, we have a general requirement that all variables
  11. must be initialized. This is fine for the simple case, such as "let x: int =
  12. 10". But, it does not always work well. Let's say that you want to set aside a
  13. large buffer for I/O:
  14. ```hare
  15. let x: [1024]int = [0, 0, 0, 0, 0, // ...
  16. ```
  17. This can clearly get out of hand. To address this problem, we added the "..."
  18. operator:
  19. ```hare
  20. let x: [1024]int = [0...];
  21. let y: *[1024]int = alloc([0...]);
  22. ```
  23. This example demonstrates both *stack* allocation of a buffer and *heap*
  24. allocation of a buffer initialized with 1024 zeroes.[^static] This "..."
  25. operator neatly solves our problem. However, another problem occurs to me: what
  26. if you want to allocate a buffer of a variable size?
  27. [^static]: You can also use *static* allocation, which is not shown here.
  28. In addition to arrays, <span class="redacted">xxxx</span> supports slices, which
  29. stores a data pointer, a length, and a capacity. The data pointer refers to an
  30. array whose length is equal to or greater than "capacity", and whose values are
  31. initialized up to "length". We have additional built-ins, "append", "insert",
  32. and "delete", which can dynamically grow and shrink a slice.
  33. ```hare
  34. let x: []int = [];
  35. defer free(x);
  36. append(x, 1, 2, 3, 4, 5); // x = [1, 2, 3, 4, 5]
  37. delete(x[..2]); // x = [3, 4, 5]
  38. insert(x[0], 1, 2); // x = [1, 2, 3, 4, 5]
  39. ```
  40. You can also allocate a slice whose capacity is set to an arbitrary value, but
  41. whose length is only equal to the number of initializers you provide. This is
  42. done through a separate case in the "alloc" grammar:
  43. ```hare
  44. use types;
  45. let x: []int = alloc([1, 2, 3], 10);
  46. assert(len(x) == 3);
  47. assert((&x: *types::slice).capacity == 10);
  48. ```
  49. This is useful if you know how long the slice will eventually be, so that you
  50. can fill it with "append" without re-allocating (which could be costly
  51. otherwise). However, setting the capacity is not the same thing as setting the
  52. length: all of the items between the length and capacity are uninitialized. How
  53. do we zero-initialize a large buffer in the heap?
  54. Until recently, you simply couldn't. You had to use a rather bad
  55. work-around:
  56. ```hare
  57. use rt;
  58. let sz: size = 1024;
  59. let data: *[*]int = rt::malloc(sz * size(int)); // [*] is an array of undefined length
  60. let x: []int = data[..sz];
  61. ```
  62. This is obviously not great. We lose type safety, the initialization guarantee,
  63. and bounds checking, and we add a footgun (multiplying by the member type size),
  64. and it's simply not very pleasant to use. To address this, we added the
  65. following syntax:
  66. ```hare
  67. let sz: size = 1024;
  68. let x: []int = alloc([0...], sz);
  69. ```
  70. Much better! Arriving at this required untangling a lot of other problems that I
  71. haven't mentioned here, but this isn't the design I want to focus on for this
  72. post. Instead, there's a new question this suggests: what about appending a
  73. variable amount of data to a slice? I want to dig into this problem to explore
  74. some of the concerns we think about when working on the language design.
  75. The first idea I came up with was the following:
  76. ```hare
  77. let x: []int = [];
  78. append(x, [0...], 10);
  79. ```
  80. This would append ten zeroes to "x". This has a problem, though. Consider our
  81. earlier example of "append":
  82. ```hare
  83. append(x, 1, 2, 3);
  84. ```
  85. The grammar for this looks like the following:[^caveat]
  86. [^caveat]: Disregard the second case of "append-values"; it's not relevant here.
  87. ![A screenshot of the language spec showing the grammar for append expressions.](https://l.sr.ht/NinS.png)
  88. So, the proposed "append(x, \[0...\], 10)" expression is *parsed* like this:
  89. slice-mutation-expression: append
  90. object-selector: x
  91. append-items:
  92. [0...]
  93. 10
  94. In other words, it looks like "append the values \[0...\] and 10 to x". This
  95. doesn't make sense, but we don't know this until we get to the type checker.
  96. What it really means is "append ten zeroes to x", and we have to identify this
  97. case in the type checker through, essentially, heuristics. Not great! If we dig
  98. deeper into this we find even more edge cases, but I will spare you from the
  99. details.
  100. So, let's consider an alternative design:
  101. ```hare
  102. append(x, [1, 2, 3]); // Previously append(x, 1, 2, 3);
  103. append(x, [0...], 10); // New feature
  104. ```
  105. The grammar for this is much better:
  106. ![A screenshot of the revised grammar for this design.](https://l.sr.ht/s5qI.png)
  107. Now we can distinguish between these cases while parsing, so the first example
  108. is parsed as:
  109. append-expression
  110. object-selector: x
  111. expression: [1, 2, 3] // Items to append
  112. The second is parsed as:
  113. append-expression
  114. object-selector: x
  115. expression: [0...] // Items to append
  116. expression: 10 // Desired length
  117. This is a big improvement, but it comes with one annoying problem. The most
  118. common case for append in regular use in <span class="redacted">xxxx</span> is
  119. appending a single item, and this case has worsened thanks to this change:
  120. ```hare
  121. append(x, [42]); // Previously append(x, 42);
  122. ```
  123. In fact, appending several items at once is exceptionally uncommon: there are no
  124. examples of it in the standard library. We should try to avoid making the common
  125. case worse for the benefit of the uncommon case.
  126. A pattern we *do* see in the standard library is appending one slice to another,
  127. which is a use-case we've ignored up to this point. This use-case looks
  128. something like the following:
  129. ```hare
  130. append(x, y...);
  131. ```
  132. Why don't we lean into this a bit more?
  133. ```hare
  134. let x: []int = [];
  135. append(x, 42); // x = [42]
  136. append(x, [1, 2, 3]...); // x = [42, 1, 2, 3]
  137. append(x, [0...], 6); // x = [42, 1, 2, 3, 0...]
  138. ```
  139. Using the `append(x, y...)` syntax to generally handle appending several items
  140. neatly solves all of our problems. We have arrived at design which:
  141. - Is versatile and utilitarian
  142. - Addresses the most common cases with a comfortable syntax
  143. - Is unambiguous at parse time without type heuristics
  144. I daresay that, in addition to fulfilling the desired new feature, we have
  145. improved the other cases as well. The final grammar for this is the following:
  146. ![Formal grammar showing the final state of the design proposal](https://l.sr.ht/2nF0.png)
  147. If you're curious to see more, I've extracted the relevant page of the
  148. specification for you to read: [download it here](https://l.sr.ht/eeta.pdf). I
  149. hope you found that interesting and insightful!
  150. *Note: Much of these details are subject to change, and we have future
  151. improvements planned which will affect these features &mdash; particularly with
  152. respect to handling allocation failures. Additionally, some of the code samples
  153. were simplified for illustrative purposes.*