commit: 0b56fa4b1d278a799da1b1ae1a3e931e456cfe0c
parent 4d690763ad6d39fd7e3b744e6264564def0ffca7
Author: Drew DeVault <sir@cmpwn.com>
Date: Tue, 5 Oct 2021 15:53:05 +0200
Reflection
Diffstat:
1 file changed, 413 insertions(+), 0 deletions(-)
diff --git a/content/blog/Reflection.md b/content/blog/Reflection.md
@@ -0,0 +1,413 @@
+---
+title: How reflection works in ****
+author: Drew DeVault
+date: 2021-10-05
+---
+
+*Note: this is a redacted copy of a blog post published on the internal
+development blog of a new systems programming language. The name of the project
+and further details are deliberately being kept in confidence until the initial
+release. You may be able to find it if you look hard enough — you have my
+thanks in advance for keeping it to yourself. For more information, see "[We are
+building a new systems programming language][post]".*
+
+<style>
+.redacted {
+ background: black;
+ foreground: black;
+}
+</style>
+
+[post]: https://drewdevault.com/2021/03/19/A-new-systems-language.html
+
+I've just merged support for reflection in <span class="redacted">xxxx</span>.
+Here's how it works!
+
+## Background
+
+"Reflection" refers to the ability for a program to examine the type system of
+its programming langauge, and to dynamically manipulate types and their values
+at runtime. You can learn more at [Wikipedia][0].
+
+[0]: https://en.wikipedia.org/wiki/Reflective_programming
+
+## Reflection from a user perspective
+
+Let's start with a small sample program:
+
+```hare
+use fmt;
+use types;
+
+export fn main() void = {
+ const my_type: type = type(int);
+ const typeinfo: *types::typeinfo = types::reflect(my_type);
+ fmt::printfln("int\nid: {}\nsize: {}\nalignment: {}",
+ typeinfo.id, typeinfo.sz, typeinfo.al)!;
+};
+```
+
+Running this program produces the following output:
+
+```
+int
+id: 1099590421
+size: 4
+alignment: 4
+```
+
+This gives us a simple starting point to look at. We can see that "type" is used
+as the type of the "my_type" variable, and initialized with a "type(int)"
+expression. This expression returns a type value for the type given in the
+parenthesis — in this case, for the "int" type.
+
+To learn anything useful, we have to convert this to a "types::typeinfo"
+pointer, which we do via `types::reflect`. The typeinfo structure looks like
+this:
+
+```hare
+type typeinfo = struct {
+ id: uint,
+ sz: size,
+ al: size,
+ flags: flags,
+ repr: repr,
+};
+```
+
+The ID field is the type's unique identifier, which is universally unique and
+deterministic, and forms part of <span class="redacted">xxxx</span>'s ABI. This
+is derived from an FNV-32 hash of the type information. You can find the ID for
+any type by modifying our little example program, or you can use the helper
+program in the <code>cmd/<span class="redacted">xxxx</span>type</code> directory
+of the <span class="redacted">xxxx</span> source tree.
+
+Another important field is the "repr" field, which is short for
+"representation", and it gives details about the inner structure of the type.
+The repr type is defined as a tagged union of all possible type representations
+in the <span class="redacted">xxxx</span> type system:
+
+```hare
+type repr = (alias | array | builtin | enumerated | func | pointer | slice | struct_union | tagged | tuple);
+```
+
+In the case of the "int" type, the representation is "builtin":
+
+```hare
+type builtin = enum uint {
+ BOOL, CHAR, F32, F64, I16, I32, I64, I8, INT, NULL, RUNE, SIZE, STR, U16, U32,
+ U64, U8, UINT, UINTPTR, VOID, TYPE,
+};
+```
+
+`builtin::INT`, in this case. The structure and representation of the "int" type
+is defined by the <span class="redacted">xxxx</span> specification and cannot be
+overridden by the program, so no further information is necessary. The relevant
+part of the spec is:
+
+!["The precision of 'int' and 'uint' are implementation-defined. 'int' shall be signed, and 'uint' shall be unsigned. Both types shall be at least 32-bits in precision. The precision in bits shall be a power of two."](https://l.sr.ht/Sbcb.png)
+![A table from the specification showing the precision ranges of each integer type](https://l.sr.ht/oZw4.png)
+
+More information is provided for more complex types, such as structs.
+
+```hare
+use fmt;
+use types;
+
+export fn main() void = {
+ const my_type: type = type(struct {
+ x: int,
+ y: int,
+ });
+ const typeinfo: *types::typeinfo = types::reflect(my_type);
+ fmt::printfln("id: {}\nsize: {}\nalignment: {}",
+ typeinfo.id, typeinfo.sz, typeinfo.al)!;
+ const st = typeinfo.repr as types::struct_union;
+ assert(st.kind == types::struct_kind::STRUCT);
+ for (let i = 0z; i < len(st.fields); i += 1) {
+ const field = st.fields[i];
+ assert(field.type_ == type(int));
+ fmt::printfln("\t{}: offset {}", field.name, field.offs)!;
+ };
+};
+```
+
+The output of this program is:
+
+```
+id: 2617358403
+size: 8
+alignment: 4
+ x: offset 0
+ y: offset 4
+```
+
+Here the "repr" field provides the "types::struct_union" structure:
+
+```hare
+type struct_union = struct {
+ kind: struct_kind,
+ fields: []struct_field,
+};
+
+type struct_kind = enum {
+ STRUCT,
+ UNION,
+};
+
+type struct_field = struct {
+ name: str,
+ offs: size,
+ type_: type,
+};
+```
+
+Makes sense? Excellent. So how does it all work?
+
+## Reflection internals
+
+Let me first draw the curtain back from the magic "types::reflect" function:
+
+```hare
+// Returns [[typeinfo]] for the provided type.
+export fn reflect(in: type) const *typeinfo = in: *typeinfo;
+```
+
+It simply casts the "type" value to a pointer, which is what it is. When the
+compiler sees an expression like `let x = type(int)`, it statically allocates
+the typeinfo data structure into the program and returns a pointer to it, which
+is then wrapped up in the opaque "type" meta-type. The "reflect" function simply
+converts it to a useful pointer. Here's the generated IR for this:
+
+```hare
+%binding.4 =l alloc8 8
+storel $rt.builtin_int, %binding.4
+```
+
+A clever eye will note that we initialize the value to a pointer to
+"rt.builtin_int", rather than allocating a typeinfo structure here and now. The
+runtime module provides static typeinfos for all built-in types, which look like
+this:
+
+```hare
+export const @hidden builtin_int: types::typeinfo = types::typeinfo {
+ id = 1099590421,
+ sz = 4, al = 4, flags = 0,
+ repr = types::builtin::INT,
+};
+```
+
+These are an internal implementation detail, hence "@hidden". But many types are
+not built-in, so the compiler is required to statically allocate a typeinfo
+structure:
+
+```hare
+export fn main() void = {
+ let x = type(struct { x: int, y: int });
+};
+```
+
+```
+data $strdata.7 = section ".data.strdata.7" { b "x" }
+
+data $strdata.8 = section ".data.strdata.8" { b "y" }
+
+data $sldata.6 = section ".data.sldata.6" {
+ l $strdata.7, l 1, l 1, l 0, l $rt.builtin_int,
+ l $strdata.8, l 1, l 1, l 4, l $rt.builtin_int,
+}
+
+data $typeinfo.5 = section ".data.typeinfo.5" {
+ w 2617358403, z 4,
+ l 8,
+ l 4,
+ w 0, z 4,
+ w 5555256, z 4,
+ w 0, z 4,
+ l $sldata.6, l 2, l 2,
+}
+
+export function section ".text.main" "ax" $main() {
+@start.0
+ %binding.4 =l alloc8 8
+@body.1
+ storel $typeinfo.5, %binding.4
+@.2
+ ret
+}
+```
+
+This has the unfortunate effect of re-generating all of these typeinfo
+structures every time someone uses `type(struct { x: int, y: int })`. We still
+have one trick up our sleeve, though: type aliases! Most people don't actually
+use anonymous structs like this often, preferring to use a type alias to give
+them a name like "coords". When they do this, the situation improves:
+
+```hare
+type coords = struct { x: int, y: int };
+
+export fn main() void = {
+ let x = type(coords);
+};
+```
+
+```
+data $strdata.1 = section ".data.strdata.1" { b "coords" }
+
+data $sldata.0 = section ".data.sldata.0" { l $strdata.1, l 6, l 6 }
+
+data $strdata.4 = section ".data.strdata.4" { b "x" }
+
+data $strdata.5 = section ".data.strdata.5" { b "y" }
+
+data $sldata.3 = section ".data.sldata.3" {
+ l $strdata.4, l 1, l 1, l 0, l $rt.builtin_int,
+ l $strdata.5, l 1, l 1, l 4, l $rt.builtin_int,
+}
+
+data $typeinfo.2 = section ".data.typeinfo.2" {
+ w 2617358403, z 4,
+ l 8,
+ l 4,
+ w 0, z 4,
+ w 5555256, z 4,
+ w 0, z 4,
+ l $sldata.3, l 2, l 2,
+}
+
+data $type.1491593906 = section ".data.type.1491593906" {
+ w 1491593906, z 4,
+ l 8,
+ l 4,
+ w 0, z 4,
+ w 3241765159, z 4,
+ l $sldata.0, l 1, l 1,
+ l $typeinfo.2
+}
+
+export function section ".text.main" "ax" $main() {
+@start.6
+ %binding.10 =l alloc8 8
+@body.7
+ storel $type.1491593906, %binding.10
+@.8
+ ret
+}
+```
+
+The declaration of a type alias provides us with the perfect opportunity to
+statically allocate a typeinfo singleton for it. Any of these which go unused by
+the program are automatically stripped out by the linker thanks to the
+`--gc-sections` flag. Also note that a type alias is considered a distinct
+representation from the underlying struct type:
+
+```hare
+type alias = struct {
+ ident: []str,
+ secondary: type,
+};
+```
+
+This explains the differences in the structure of the "type.1491593906" global.
+The <code>struct { x: int, y: int }</code> type is
+the "secondary" field of this type.
+
+## Future improvements
+
+This is just the first half of the equation. The next half is to provide useful
+functions to work with this data. One such example is "types::strenum":
+
+```hare
+// Returns the value of the enum at "val" as a string. Aborts if the value is
+// not present. Note that this does not work with enums being used as a flag
+// type, see [[strflag]] instead.
+export fn strenum(ty: type, val: *void) str = {
+ const ty = unwrap(ty);
+ const en = ty.repr as enumerated;
+ const value: u64 = switch (en.storage) {
+ case builtin::CHAR, builtin::I8, builtin::U8 =>
+ yield *(val: *u8);
+ case builtin::I16, builtin::U16 =>
+ yield *(val: *u16);
+ case builtin::I32, builtin::U32 =>
+ yield *(val: *u32);
+ case builtin::I64, builtin::U64 =>
+ yield *(val: *u64);
+ case builtin::INT, builtin::UINT =>
+ yield switch (size(int)) {
+ case 4 =>
+ yield *(val: *u32);
+ case 8 =>
+ yield *(val: *u64);
+ case => abort();
+ };
+ case builtin::SIZE =>
+ yield switch (size(size)) {
+ case 4 =>
+ yield *(val: *u32);
+ case 8 =>
+ yield *(val: *u64);
+ case => abort();
+ };
+ case => abort();
+ };
+
+ for (let i = 0z; i < len(en.values); i += 1) {
+ if (en.values[i].1.u == value) {
+ return en.values[i].0;
+ };
+ };
+
+ abort("enum has invalid value");
+};
+```
+
+This is used like so:
+
+```hare
+use types;
+use fmt;
+
+type watchmen = enum {
+ VIMES,
+ CARROT,
+ ANGUA,
+ COLON,
+ NOBBY = -1,
+};
+
+export fn main() void = {
+ let officer = watchmen::ANGUA;
+ fmt::println(types::strenum(type(watchmen), &officer))!; // Prints ANGUA
+};
+```
+
+Additional work is required to make more useful tools like this. We will
+probably want to introduce a "value" abstraction which can store an arbitrary
+value for an arbitrary type, and helper functions to assign to or read from
+those values. A particularly complex case is likely to be some kind of helper
+for calling a function pointer via reflection, which we I may cover in a later
+article. There will also be some work to bring the "types" (reflection) module
+closer to the <span class="redacted">xxxx</span>::* namespace, which already
+features <span class="redacted">xxxx</span>::ast, <span
+class="redacted">xxxx</span>::parse, and <span
+class="redacted">xxxx</span>::types, so that the parser, type checker, and
+reflection systems are interopable and work together to implement the <span
+class="redacted">xxxx</span> type system.
+
+---
+
+*Want to help us build this language? We are primarily looking for help in the
+following domains:*
+
+- *Architectures or operating systems, to help with ports*
+- *Compilers & language design*
+- *Cryptography implementations*
+- *Date & time implementations*
+- *Unix*
+
+*If you're an expert in a domain which is not listed, but that you think we
+should know about, then feel free to reach out. Experts are perferred, motivated
+enthusiasts are acceptable. [Send me an email][mail] if you want to help!*
+
+[mail]: mailto:sir@cmpwn.com