logo

drewdevault.com

[mirror] blog and personal website of Drew DeVault git clone https://hacktivis.me/git/mirror/drewdevault.com.git
commit: 0b56fa4b1d278a799da1b1ae1a3e931e456cfe0c
parent 4d690763ad6d39fd7e3b744e6264564def0ffca7
Author: Drew DeVault <sir@cmpwn.com>
Date:   Tue,  5 Oct 2021 15:53:05 +0200

Reflection

Diffstat:

Acontent/blog/Reflection.md413+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 413 insertions(+), 0 deletions(-)

diff --git a/content/blog/Reflection.md b/content/blog/Reflection.md @@ -0,0 +1,413 @@ +--- +title: How reflection works in **** +author: Drew DeVault +date: 2021-10-05 +--- + +*Note: this is a redacted copy of a blog post published on the internal +development blog of a new systems programming language. The name of the project +and further details are deliberately being kept in confidence until the initial +release. You may be able to find it if you look hard enough &mdash; you have my +thanks in advance for keeping it to yourself. For more information, see "[We are +building a new systems programming language][post]".* + +<style> +.redacted { + background: black; + foreground: black; +} +</style> + +[post]: https://drewdevault.com/2021/03/19/A-new-systems-language.html + +I've just merged support for reflection in <span class="redacted">xxxx</span>. +Here's how it works! + +## Background + +"Reflection" refers to the ability for a program to examine the type system of +its programming langauge, and to dynamically manipulate types and their values +at runtime. You can learn more at [Wikipedia][0]. + +[0]: https://en.wikipedia.org/wiki/Reflective_programming + +## Reflection from a user perspective + +Let's start with a small sample program: + +```hare +use fmt; +use types; + +export fn main() void = { + const my_type: type = type(int); + const typeinfo: *types::typeinfo = types::reflect(my_type); + fmt::printfln("int\nid: {}\nsize: {}\nalignment: {}", + typeinfo.id, typeinfo.sz, typeinfo.al)!; +}; +``` + +Running this program produces the following output: + +``` +int +id: 1099590421 +size: 4 +alignment: 4 +``` + +This gives us a simple starting point to look at. We can see that "type" is used +as the type of the "my_type" variable, and initialized with a "type(int)" +expression. This expression returns a type value for the type given in the +parenthesis &mdash; in this case, for the "int" type. + +To learn anything useful, we have to convert this to a "types::typeinfo" +pointer, which we do via `types::reflect`. The typeinfo structure looks like +this: + +```hare +type typeinfo = struct { + id: uint, + sz: size, + al: size, + flags: flags, + repr: repr, +}; +``` + +The ID field is the type's unique identifier, which is universally unique and +deterministic, and forms part of <span class="redacted">xxxx</span>'s ABI. This +is derived from an FNV-32 hash of the type information. You can find the ID for +any type by modifying our little example program, or you can use the helper +program in the <code>cmd/<span class="redacted">xxxx</span>type</code> directory +of the <span class="redacted">xxxx</span> source tree. + +Another important field is the "repr" field, which is short for +"representation", and it gives details about the inner structure of the type. +The repr type is defined as a tagged union of all possible type representations +in the <span class="redacted">xxxx</span> type system: + +```hare +type repr = (alias | array | builtin | enumerated | func | pointer | slice | struct_union | tagged | tuple); +``` + +In the case of the "int" type, the representation is "builtin": + +```hare +type builtin = enum uint { + BOOL, CHAR, F32, F64, I16, I32, I64, I8, INT, NULL, RUNE, SIZE, STR, U16, U32, + U64, U8, UINT, UINTPTR, VOID, TYPE, +}; +``` + +`builtin::INT`, in this case. The structure and representation of the "int" type +is defined by the <span class="redacted">xxxx</span> specification and cannot be +overridden by the program, so no further information is necessary. The relevant +part of the spec is: + +!["The precision of 'int' and 'uint' are implementation-defined. 'int' shall be signed, and 'uint' shall be unsigned. Both types shall be at least 32-bits in precision. The precision in bits shall be a power of two."](https://l.sr.ht/Sbcb.png) +![A table from the specification showing the precision ranges of each integer type](https://l.sr.ht/oZw4.png) + +More information is provided for more complex types, such as structs. + +```hare +use fmt; +use types; + +export fn main() void = { + const my_type: type = type(struct { + x: int, + y: int, + }); + const typeinfo: *types::typeinfo = types::reflect(my_type); + fmt::printfln("id: {}\nsize: {}\nalignment: {}", + typeinfo.id, typeinfo.sz, typeinfo.al)!; + const st = typeinfo.repr as types::struct_union; + assert(st.kind == types::struct_kind::STRUCT); + for (let i = 0z; i < len(st.fields); i += 1) { + const field = st.fields[i]; + assert(field.type_ == type(int)); + fmt::printfln("\t{}: offset {}", field.name, field.offs)!; + }; +}; +``` + +The output of this program is: + +``` +id: 2617358403 +size: 8 +alignment: 4 + x: offset 0 + y: offset 4 +``` + +Here the "repr" field provides the "types::struct_union" structure: + +```hare +type struct_union = struct { + kind: struct_kind, + fields: []struct_field, +}; + +type struct_kind = enum { + STRUCT, + UNION, +}; + +type struct_field = struct { + name: str, + offs: size, + type_: type, +}; +``` + +Makes sense? Excellent. So how does it all work? + +## Reflection internals + +Let me first draw the curtain back from the magic "types::reflect" function: + +```hare +// Returns [[typeinfo]] for the provided type. +export fn reflect(in: type) const *typeinfo = in: *typeinfo; +``` + +It simply casts the "type" value to a pointer, which is what it is. When the +compiler sees an expression like `let x = type(int)`, it statically allocates +the typeinfo data structure into the program and returns a pointer to it, which +is then wrapped up in the opaque "type" meta-type. The "reflect" function simply +converts it to a useful pointer. Here's the generated IR for this: + +```hare +%binding.4 =l alloc8 8 +storel $rt.builtin_int, %binding.4 +``` + +A clever eye will note that we initialize the value to a pointer to +"rt.builtin_int", rather than allocating a typeinfo structure here and now. The +runtime module provides static typeinfos for all built-in types, which look like +this: + +```hare +export const @hidden builtin_int: types::typeinfo = types::typeinfo { + id = 1099590421, + sz = 4, al = 4, flags = 0, + repr = types::builtin::INT, +}; +``` + +These are an internal implementation detail, hence "@hidden". But many types are +not built-in, so the compiler is required to statically allocate a typeinfo +structure: + +```hare +export fn main() void = { + let x = type(struct { x: int, y: int }); +}; +``` + +``` +data $strdata.7 = section ".data.strdata.7" { b "x" } + +data $strdata.8 = section ".data.strdata.8" { b "y" } + +data $sldata.6 = section ".data.sldata.6" { + l $strdata.7, l 1, l 1, l 0, l $rt.builtin_int, + l $strdata.8, l 1, l 1, l 4, l $rt.builtin_int, +} + +data $typeinfo.5 = section ".data.typeinfo.5" { + w 2617358403, z 4, + l 8, + l 4, + w 0, z 4, + w 5555256, z 4, + w 0, z 4, + l $sldata.6, l 2, l 2, +} + +export function section ".text.main" "ax" $main() { +@start.0 + %binding.4 =l alloc8 8 +@body.1 + storel $typeinfo.5, %binding.4 +@.2 + ret +} +``` + +This has the unfortunate effect of re-generating all of these typeinfo +structures every time someone uses `type(struct { x: int, y: int })`. We still +have one trick up our sleeve, though: type aliases! Most people don't actually +use anonymous structs like this often, preferring to use a type alias to give +them a name like "coords". When they do this, the situation improves: + +```hare +type coords = struct { x: int, y: int }; + +export fn main() void = { + let x = type(coords); +}; +``` + +``` +data $strdata.1 = section ".data.strdata.1" { b "coords" } + +data $sldata.0 = section ".data.sldata.0" { l $strdata.1, l 6, l 6 } + +data $strdata.4 = section ".data.strdata.4" { b "x" } + +data $strdata.5 = section ".data.strdata.5" { b "y" } + +data $sldata.3 = section ".data.sldata.3" { + l $strdata.4, l 1, l 1, l 0, l $rt.builtin_int, + l $strdata.5, l 1, l 1, l 4, l $rt.builtin_int, +} + +data $typeinfo.2 = section ".data.typeinfo.2" { + w 2617358403, z 4, + l 8, + l 4, + w 0, z 4, + w 5555256, z 4, + w 0, z 4, + l $sldata.3, l 2, l 2, +} + +data $type.1491593906 = section ".data.type.1491593906" { + w 1491593906, z 4, + l 8, + l 4, + w 0, z 4, + w 3241765159, z 4, + l $sldata.0, l 1, l 1, + l $typeinfo.2 +} + +export function section ".text.main" "ax" $main() { +@start.6 + %binding.10 =l alloc8 8 +@body.7 + storel $type.1491593906, %binding.10 +@.8 + ret +} +``` + +The declaration of a type alias provides us with the perfect opportunity to +statically allocate a typeinfo singleton for it. Any of these which go unused by +the program are automatically stripped out by the linker thanks to the +`--gc-sections` flag. Also note that a type alias is considered a distinct +representation from the underlying struct type: + +```hare +type alias = struct { + ident: []str, + secondary: type, +}; +``` + +This explains the differences in the structure of the "type.1491593906" global. +The <code>struct&nbsp;{&nbsp;x:&nbsp;int,&nbsp;y:&nbsp;int&nbsp;}</code> type is +the "secondary" field of this type. + +## Future improvements + +This is just the first half of the equation. The next half is to provide useful +functions to work with this data. One such example is "types::strenum": + +```hare +// Returns the value of the enum at "val" as a string. Aborts if the value is +// not present. Note that this does not work with enums being used as a flag +// type, see [[strflag]] instead. +export fn strenum(ty: type, val: *void) str = { + const ty = unwrap(ty); + const en = ty.repr as enumerated; + const value: u64 = switch (en.storage) { + case builtin::CHAR, builtin::I8, builtin::U8 => + yield *(val: *u8); + case builtin::I16, builtin::U16 => + yield *(val: *u16); + case builtin::I32, builtin::U32 => + yield *(val: *u32); + case builtin::I64, builtin::U64 => + yield *(val: *u64); + case builtin::INT, builtin::UINT => + yield switch (size(int)) { + case 4 => + yield *(val: *u32); + case 8 => + yield *(val: *u64); + case => abort(); + }; + case builtin::SIZE => + yield switch (size(size)) { + case 4 => + yield *(val: *u32); + case 8 => + yield *(val: *u64); + case => abort(); + }; + case => abort(); + }; + + for (let i = 0z; i < len(en.values); i += 1) { + if (en.values[i].1.u == value) { + return en.values[i].0; + }; + }; + + abort("enum has invalid value"); +}; +``` + +This is used like so: + +```hare +use types; +use fmt; + +type watchmen = enum { + VIMES, + CARROT, + ANGUA, + COLON, + NOBBY = -1, +}; + +export fn main() void = { + let officer = watchmen::ANGUA; + fmt::println(types::strenum(type(watchmen), &officer))!; // Prints ANGUA +}; +``` + +Additional work is required to make more useful tools like this. We will +probably want to introduce a "value" abstraction which can store an arbitrary +value for an arbitrary type, and helper functions to assign to or read from +those values. A particularly complex case is likely to be some kind of helper +for calling a function pointer via reflection, which we I may cover in a later +article. There will also be some work to bring the "types" (reflection) module +closer to the <span class="redacted">xxxx</span>::* namespace, which already +features <span class="redacted">xxxx</span>::ast, <span +class="redacted">xxxx</span>::parse, and <span +class="redacted">xxxx</span>::types, so that the parser, type checker, and +reflection systems are interopable and work together to implement the <span +class="redacted">xxxx</span> type system. + +--- + +*Want to help us build this language? We are primarily looking for help in the +following domains:* + +- *Architectures or operating systems, to help with ports* +- *Compilers & language design* +- *Cryptography implementations* +- *Date & time implementations* +- *Unix* + +*If you're an expert in a domain which is not listed, but that you think we +should know about, then feel free to reach out. Experts are perferred, motivated +enthusiasts are acceptable. [Send me an email][mail] if you want to help!* + +[mail]: mailto:sir@cmpwn.com