logo

drewdevault.com

[mirror] blog and personal website of Drew DeVault git clone https://hacktivis.me/git/mirror/drewdevault.com.git

Reflection.md (11563B)


  1. ---
  2. title: How reflection works in ****
  3. date: 2021-10-05
  4. outputs: [html, gemtext]
  5. ---
  6. *Note: this is a redacted copy of a blog post published on the internal
  7. development blog of a new systems programming language. The name of the project
  8. and further details are deliberately being kept in confidence until the initial
  9. release. You may be able to find it if you look hard enough — you have my
  10. thanks in advance for keeping it to yourself. For more information, see "[We are
  11. building a new systems programming language][post]".*
  12. <style>
  13. .redacted {
  14. background: black;
  15. foreground: black;
  16. }
  17. </style>
  18. [post]: https://drewdevault.com/2021/03/19/A-new-systems-language.html
  19. I've just merged support for reflection in <span class="redacted">xxxx</span>.
  20. Here's how it works!
  21. ## Background
  22. "Reflection" refers to the ability for a program to examine the type system of
  23. its programming language, and to dynamically manipulate types and their values
  24. at runtime. You can learn more at [Wikipedia][0].
  25. [0]: https://en.wikipedia.org/wiki/Reflective_programming
  26. ## Reflection from a user perspective
  27. Let's start with a small sample program:
  28. ```hare
  29. use fmt;
  30. use types;
  31. export fn main() void = {
  32. const my_type: type = type(int);
  33. const typeinfo: *types::typeinfo = types::reflect(my_type);
  34. fmt::printfln("int\nid: {}\nsize: {}\nalignment: {}",
  35. typeinfo.id, typeinfo.sz, typeinfo.al)!;
  36. };
  37. ```
  38. Running this program produces the following output:
  39. ```
  40. int
  41. id: 1099590421
  42. size: 4
  43. alignment: 4
  44. ```
  45. This gives us a simple starting point to look at. We can see that "type" is used
  46. as the type of the "my_type" variable, and initialized with a "type(int)"
  47. expression. This expression returns a type value for the type given in the
  48. parenthesis &mdash; in this case, for the "int" type.
  49. To learn anything useful, we have to convert this to a "types::typeinfo"
  50. pointer, which we do via `types::reflect`. The typeinfo structure looks like
  51. this:
  52. ```hare
  53. type typeinfo = struct {
  54. id: uint,
  55. sz: size,
  56. al: size,
  57. flags: flags,
  58. repr: repr,
  59. };
  60. ```
  61. The ID field is the type's unique identifier, which is universally unique and
  62. deterministic, and forms part of <span class="redacted">xxxx</span>'s ABI. This
  63. is derived from an FNV-32 hash of the type information. You can find the ID for
  64. any type by modifying our little example program, or you can use the helper
  65. program in the <code>cmd/<span class="redacted">xxxx</span>type</code> directory
  66. of the <span class="redacted">xxxx</span> source tree.
  67. Another important field is the "repr" field, which is short for
  68. "representation", and it gives details about the inner structure of the type.
  69. The repr type is defined as a tagged union of all possible type representations
  70. in the <span class="redacted">xxxx</span> type system:
  71. ```hare
  72. type repr = (alias | array | builtin | enumerated | func | pointer | slice | struct_union | tagged | tuple);
  73. ```
  74. In the case of the "int" type, the representation is "builtin":
  75. ```hare
  76. type builtin = enum uint {
  77. BOOL, CHAR, F32, F64, I16, I32, I64, I8, INT, NULL, RUNE, SIZE, STR, U16, U32,
  78. U64, U8, UINT, UINTPTR, VOID, TYPE,
  79. };
  80. ```
  81. `builtin::INT`, in this case. The structure and representation of the "int" type
  82. is defined by the <span class="redacted">xxxx</span> specification and cannot be
  83. overridden by the program, so no further information is necessary. The relevant
  84. part of the spec is:
  85. !["The precision of 'int' and 'uint' are implementation-defined. 'int' shall be signed, and 'uint' shall be unsigned. Both types shall be at least 32-bits in precision. The precision in bits shall be a power of two."](https://l.sr.ht/Sbcb.png)
  86. ![A table from the specification showing the precision ranges of each integer type](https://l.sr.ht/oZw4.png)
  87. More information is provided for more complex types, such as structs.
  88. ```hare
  89. use fmt;
  90. use types;
  91. export fn main() void = {
  92. const my_type: type = type(struct {
  93. x: int,
  94. y: int,
  95. });
  96. const typeinfo: *types::typeinfo = types::reflect(my_type);
  97. fmt::printfln("id: {}\nsize: {}\nalignment: {}",
  98. typeinfo.id, typeinfo.sz, typeinfo.al)!;
  99. const st = typeinfo.repr as types::struct_union;
  100. assert(st.kind == types::struct_kind::STRUCT);
  101. for (let i = 0z; i < len(st.fields); i += 1) {
  102. const field = st.fields[i];
  103. assert(field.type_ == type(int));
  104. fmt::printfln("\t{}: offset {}", field.name, field.offs)!;
  105. };
  106. };
  107. ```
  108. The output of this program is:
  109. ```
  110. id: 2617358403
  111. size: 8
  112. alignment: 4
  113. x: offset 0
  114. y: offset 4
  115. ```
  116. Here the "repr" field provides the "types::struct_union" structure:
  117. ```hare
  118. type struct_union = struct {
  119. kind: struct_kind,
  120. fields: []struct_field,
  121. };
  122. type struct_kind = enum {
  123. STRUCT,
  124. UNION,
  125. };
  126. type struct_field = struct {
  127. name: str,
  128. offs: size,
  129. type_: type,
  130. };
  131. ```
  132. Makes sense? Excellent. So how does it all work?
  133. ## Reflection internals
  134. Let me first draw the curtain back from the magic "types::reflect" function:
  135. ```hare
  136. // Returns [[typeinfo]] for the provided type.
  137. export fn reflect(in: type) const *typeinfo = in: *typeinfo;
  138. ```
  139. It simply casts the "type" value to a pointer, which is what it is. When the
  140. compiler sees an expression like `let x = type(int)`, it statically allocates
  141. the typeinfo data structure into the program and returns a pointer to it, which
  142. is then wrapped up in the opaque "type" meta-type. The "reflect" function simply
  143. converts it to a useful pointer. Here's the generated IR for this:
  144. ```hare
  145. %binding.4 =l alloc8 8
  146. storel $rt.builtin_int, %binding.4
  147. ```
  148. A clever eye will note that we initialize the value to a pointer to
  149. "rt.builtin_int", rather than allocating a typeinfo structure here and now. The
  150. runtime module provides static typeinfos for all built-in types, which look like
  151. this:
  152. ```hare
  153. export const @hidden builtin_int: types::typeinfo = types::typeinfo {
  154. id = 1099590421,
  155. sz = 4, al = 4, flags = 0,
  156. repr = types::builtin::INT,
  157. };
  158. ```
  159. These are an internal implementation detail, hence "@hidden". But many types are
  160. not built-in, so the compiler is required to statically allocate a typeinfo
  161. structure:
  162. ```hare
  163. export fn main() void = {
  164. let x = type(struct { x: int, y: int });
  165. };
  166. ```
  167. ```
  168. data $strdata.7 = section ".data.strdata.7" { b "x" }
  169. data $strdata.8 = section ".data.strdata.8" { b "y" }
  170. data $sldata.6 = section ".data.sldata.6" {
  171. l $strdata.7, l 1, l 1, l 0, l $rt.builtin_int,
  172. l $strdata.8, l 1, l 1, l 4, l $rt.builtin_int,
  173. }
  174. data $typeinfo.5 = section ".data.typeinfo.5" {
  175. w 2617358403, z 4,
  176. l 8,
  177. l 4,
  178. w 0, z 4,
  179. w 5555256, z 4,
  180. w 0, z 4,
  181. l $sldata.6, l 2, l 2,
  182. }
  183. export function section ".text.main" "ax" $main() {
  184. @start.0
  185. %binding.4 =l alloc8 8
  186. @body.1
  187. storel $typeinfo.5, %binding.4
  188. @.2
  189. ret
  190. }
  191. ```
  192. This has the unfortunate effect of re-generating all of these typeinfo
  193. structures every time someone uses `type(struct { x: int, y: int })`. We still
  194. have one trick up our sleeve, though: type aliases! Most people don't actually
  195. use anonymous structs like this often, preferring to use a type alias to give
  196. them a name like "coords". When they do this, the situation improves:
  197. ```hare
  198. type coords = struct { x: int, y: int };
  199. export fn main() void = {
  200. let x = type(coords);
  201. };
  202. ```
  203. ```
  204. data $strdata.1 = section ".data.strdata.1" { b "coords" }
  205. data $sldata.0 = section ".data.sldata.0" { l $strdata.1, l 6, l 6 }
  206. data $strdata.4 = section ".data.strdata.4" { b "x" }
  207. data $strdata.5 = section ".data.strdata.5" { b "y" }
  208. data $sldata.3 = section ".data.sldata.3" {
  209. l $strdata.4, l 1, l 1, l 0, l $rt.builtin_int,
  210. l $strdata.5, l 1, l 1, l 4, l $rt.builtin_int,
  211. }
  212. data $typeinfo.2 = section ".data.typeinfo.2" {
  213. w 2617358403, z 4,
  214. l 8,
  215. l 4,
  216. w 0, z 4,
  217. w 5555256, z 4,
  218. w 0, z 4,
  219. l $sldata.3, l 2, l 2,
  220. }
  221. data $type.1491593906 = section ".data.type.1491593906" {
  222. w 1491593906, z 4,
  223. l 8,
  224. l 4,
  225. w 0, z 4,
  226. w 3241765159, z 4,
  227. l $sldata.0, l 1, l 1,
  228. l $typeinfo.2
  229. }
  230. export function section ".text.main" "ax" $main() {
  231. @start.6
  232. %binding.10 =l alloc8 8
  233. @body.7
  234. storel $type.1491593906, %binding.10
  235. @.8
  236. ret
  237. }
  238. ```
  239. The declaration of a type alias provides us with the perfect opportunity to
  240. statically allocate a typeinfo singleton for it. Any of these which go unused by
  241. the program are automatically stripped out by the linker thanks to the
  242. `--gc-sections` flag. Also note that a type alias is considered a distinct
  243. representation from the underlying struct type:
  244. ```hare
  245. type alias = struct {
  246. ident: []str,
  247. secondary: type,
  248. };
  249. ```
  250. This explains the differences in the structure of the "type.1491593906" global.
  251. The <code>struct&nbsp;{&nbsp;x:&nbsp;int,&nbsp;y:&nbsp;int&nbsp;}</code> type is
  252. the "secondary" field of this type.
  253. ## Future improvements
  254. This is just the first half of the equation. The next half is to provide useful
  255. functions to work with this data. One such example is "types::strenum":
  256. ```hare
  257. // Returns the value of the enum at "val" as a string. Aborts if the value is
  258. // not present. Note that this does not work with enums being used as a flag
  259. // type, see [[strflag]] instead.
  260. export fn strenum(ty: type, val: *void) str = {
  261. const ty = unwrap(ty);
  262. const en = ty.repr as enumerated;
  263. const value: u64 = switch (en.storage) {
  264. case builtin::CHAR, builtin::I8, builtin::U8 =>
  265. yield *(val: *u8);
  266. case builtin::I16, builtin::U16 =>
  267. yield *(val: *u16);
  268. case builtin::I32, builtin::U32 =>
  269. yield *(val: *u32);
  270. case builtin::I64, builtin::U64 =>
  271. yield *(val: *u64);
  272. case builtin::INT, builtin::UINT =>
  273. yield switch (size(int)) {
  274. case 4 =>
  275. yield *(val: *u32);
  276. case 8 =>
  277. yield *(val: *u64);
  278. case => abort();
  279. };
  280. case builtin::SIZE =>
  281. yield switch (size(size)) {
  282. case 4 =>
  283. yield *(val: *u32);
  284. case 8 =>
  285. yield *(val: *u64);
  286. case => abort();
  287. };
  288. case => abort();
  289. };
  290. for (let i = 0z; i < len(en.values); i += 1) {
  291. if (en.values[i].1.u == value) {
  292. return en.values[i].0;
  293. };
  294. };
  295. abort("enum has invalid value");
  296. };
  297. ```
  298. This is used like so:
  299. ```hare
  300. use types;
  301. use fmt;
  302. type watchmen = enum {
  303. VIMES,
  304. CARROT,
  305. ANGUA,
  306. COLON,
  307. NOBBY = -1,
  308. };
  309. export fn main() void = {
  310. let officer = watchmen::ANGUA;
  311. fmt::println(types::strenum(type(watchmen), &officer))!; // Prints ANGUA
  312. };
  313. ```
  314. Additional work is required to make more useful tools like this. We will
  315. probably want to introduce a "value" abstraction which can store an arbitrary
  316. value for an arbitrary type, and helper functions to assign to or read from
  317. those values. A particularly complex case is likely to be some kind of helper
  318. for calling a function pointer via reflection, which we I may cover in a later
  319. article. There will also be some work to bring the "types" (reflection) module
  320. closer to the <span class="redacted">xxxx</span>::* namespace, which already
  321. features <span class="redacted">xxxx</span>::ast, <span
  322. class="redacted">xxxx</span>::parse, and <span
  323. class="redacted">xxxx</span>::types, so that the parser, type checker, and
  324. reflection systems are interopable and work together to implement the <span
  325. class="redacted">xxxx</span> type system.
  326. ---
  327. *Want to help us build this language? We are primarily looking for help in the
  328. following domains:*
  329. - *Architectures or operating systems, to help with ports*
  330. - *Compilers & language design*
  331. - *Cryptography implementations*
  332. - *Date & time implementations*
  333. - *Unix*
  334. *If you're an expert in a domain which is not listed, but that you think we
  335. should know about, then feel free to reach out. Experts are perferred, motivated
  336. enthusiasts are acceptable. [Send me an email][mail] if you want to help!*
  337. [mail]: mailto:sir@cmpwn.com