commit: 591f34d6d28ab7ee4a318bb2b5bcb9a041a8f95a
parent 5d5a76d3dddafe452c18f1e7520d76504fff0ffe
Author: Drew DeVault <sir@cmpwn.com>
Date: Sat, 14 May 2022 15:27:23 +0200
A Hare code generator for finding ioctl numbers
Diffstat:
1 file changed, 514 insertions(+), 0 deletions(-)
diff --git a/content/blog/generating-ioctls.md b/content/blog/generating-ioctls.md
@@ -0,0 +1,514 @@
+---
+title: A Hare code generator for finding ioctl numbers
+date: 2022-05-14
+---
+
+Modern Unix derivatives have this really bad idea called [ioctl]. It's a
+function which performs arbitrary operations on a file descriptor. It is
+essentially the kitchen sink of modern Unix derivatives, particularly Linux, in
+which they act almost like a second set of extra syscalls. For example, to get
+the size of the terminal window, you use an ioctl specific to TTY file
+descriptors:
+
+[ioctl]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/ioctl.html
+
+```hare
+let wsz = rt::winsize { ... };
+match (rt::ioctl(fd, rt::TIOCGWINSZ, &wsz: *void)) {
+case let e: rt::errno =>
+ switch (e: int) {
+ case rt::EBADFD =>
+ return errors::invalid;
+ case rt::ENOTTY =>
+ return errors::unsupported;
+ case =>
+ abort("Unexpected error from ioctl");
+ };
+case int =>
+ return ttysize {
+ rows = wsz.ws_row,
+ columns = wsz.ws_col,
+ };
+};
+```
+
+This code performs the ioctl syscall against the provided file descriptor "fd",
+using the "TIOCGWINSZ" operation, and setting the parameter to a pointer to a
+winsize structure. There are thousands of ioctls provided by Linux, and each of
+them is assigned a constant like TIOCGWINSZ (0x5413). Some constants, including
+this one, are assigned somewhat arbitrarily. However, some are assigned with
+some degree of structure.
+
+Consider for instance the ioctl TUNSETOWNER, which is used for tun/tap network
+devices. This ioctl is assigned the number 0x400454cc, but this is not selected
+arbitrarily. It's assigned with a macro, which we can find in
+/usr/include/linux/if_tun.h:
+
+```c
+#define TUNSETOWNER _IOW('T', 204, int)
+```
+
+The \_IOW macro, along with similar ones like \_IO, \_IOR, and \_IOWR, are
+defined in /usr/include/asm-generic/ioctl.h. They combine this letter, number,
+and parameter type (or rather its size), and the direction (R, W, WR, or
+neither), OR'd together into an unsigned 32-bit number:
+
+```c
+#define _IOC_WRITE 1U
+
+#define _IOC_TYPECHECK(t) (sizeof(t))
+
+#define _IOC(dir,type,nr,size) \
+ (((dir) << _IOC_DIRSHIFT) | \
+ ((type) << _IOC_TYPESHIFT) | \
+ ((nr) << _IOC_NRSHIFT) | \
+ ((size) << _IOC_SIZESHIFT))
+
+#define _IOW(type,nr,size) _IOC(_IOC_WRITE,(type),(nr),(_IOC_TYPECHECK(size)))
+```
+
+It would be useful to define ioctl numbers in a similar fashion for Hare
+programs. However, Hare lacks macros, so we cannot re-implement this in exactly
+the same manner. Instead, we can use code generation.
+
+*[Hare](https://harelang.org) is a new systems programming language I've been
+working on for a couple of years. Check out the [announcement][0] for more
+detail.*
+
+[0]: https://harelang.org/blog/2022-04-25-announcing-hare/
+
+Again using the tun interface as an example, our goal is to turn the following
+input file:
+
+```hare
+type sock_filter = struct {
+ code: u16,
+ jt: u8,
+ jf: u8,
+ k: u32,
+};
+
+type sock_fprog = struct {
+ length: u16,
+ filter: *sock_filter,
+};
+
+def TUNSETNOCSUM: u32 = @_IOW('T', 200, int);
+def TUNSETDEBUG: u32 = @_IOW('T', 201, int);
+def TUNSETIFF: u32 = @_IOW('T', 202, int);
+def TUNSETPERSIST: u32 = @_IOW('T', 203, int);
+def TUNSETOWNER: u32 = @_IOW('T', 204, int);
+def TUNSETLINK: u32 = @_IOW('T', 205, int);
+def TUNSETGROUP: u32 = @_IOW('T', 206, int);
+def TUNGETFEATURES: u32 = @_IOR('T', 207, uint);
+def TUNSETOFFLOAD: u32 = @_IOW('T', 208, uint);
+def TUNSETTXFILTER: u32 = @_IOW('T', 209, uint);
+def TUNGETIFF: u32 = @_IOR('T', 210, uint);
+def TUNGETSNDBUF: u32 = @_IOR('T', 211, int);
+def TUNSETSNDBUF: u32 = @_IOW('T', 212, int);
+def TUNATTACHFILTER: u32 = @_IOW('T', 213, sock_fprog);
+def TUNDETACHFILTER: u32 = @_IOW('T', 214, sock_fprog);
+def TUNGETVNETHDRSZ: u32 = @_IOR('T', 215, int);
+def TUNSETVNETHDRSZ: u32 = @_IOW('T', 216, int);
+def TUNSETQUEUE: u32 = @_IOW('T', 217, int);
+def TUNSETIFINDEX: u32 = @_IOW('T', 218, uint);
+def TUNGETFILTER: u32 = @_IOR('T', 219, sock_fprog);
+def TUNSETVNETLE: u32 = @_IOW('T', 220, int);
+def TUNGETVNETLE: u32 = @_IOR('T', 221, int);
+def TUNSETVNETBE: u32 = @_IOW('T', 222, int);
+def TUNGETVNETBE: u32 = @_IOR('T', 223, int);
+def TUNSETSTEERINGEBPF: u32 = @_IOR('T', 224, int);
+def TUNSETFILTEREBPF: u32 = @_IOR('T', 225, int);
+def TUNSETCARRIER: u32 = @_IOW('T', 226, int);
+def TUNGETDEVNETNS: u32 = @_IO('T', 227);
+```
+
+Into the following output file:
+
+```hare
+type sock_filter = struct {
+ code: u16,
+ jt: u8,
+ jf: u8,
+ k: u32,
+};
+
+type sock_fprog = struct {
+ length: u16,
+ filter: *sock_filter,
+};
+
+def TUNSETNOCSUM: u32 = 0x400454c8;
+def TUNSETDEBUG: u32 = 0x400454c9;
+def TUNSETIFF: u32 = 0x400454ca;
+def TUNSETPERSIST: u32 = 0x400454cb;
+def TUNSETOWNER: u32 = 0x400454cc;
+def TUNSETLINK: u32 = 0x400454cd;
+def TUNSETGROUP: u32 = 0x400454ce;
+def TUNGETFEATURES: u32 = 0x800454cf;
+def TUNSETOFFLOAD: u32 = 0x400454d0;
+def TUNSETTXFILTER: u32 = 0x400454d1;
+def TUNGETIFF: u32 = 0x800454d2;
+def TUNGETSNDBUF: u32 = 0x800454d3;
+def TUNSETSNDBUF: u32 = 0x400454d4;
+def TUNATTACHFILTER: u32 = 0x401054d5;
+def TUNDETACHFILTER: u32 = 0x401054d6;
+def TUNGETVNETHDRSZ: u32 = 0x800454d7;
+def TUNSETVNETHDRSZ: u32 = 0x400454d8;
+def TUNSETQUEUE: u32 = 0x400454d9;
+def TUNSETIFINDEX: u32 = 0x400454da;
+def TUNGETFILTER: u32 = 0x801054db;
+def TUNSETVNETLE: u32 = 0x400454dc;
+def TUNGETVNETLE: u32 = 0x800454dd;
+def TUNSETVNETBE: u32 = 0x400454de;
+def TUNGETVNETBE: u32 = 0x800454df;
+def TUNSETSTEERINGEBPF: u32 = 0x800454e0;
+def TUNSETFILTEREBPF: u32 = 0x800454e1;
+def TUNSETCARRIER: u32 = 0x400454e2;
+def TUNGETDEVNETNS: u32 = 0x54e3;
+```
+
+I wrote the [ioctlgen] tool for this purpose, and since it demonstrates a number
+of interesting Hare features, I thought it would make for a cool blog post. This
+program must do the following things:
+
+[ioctlgen]: https://git.sr.ht/~sircmpwn/hare/tree/master/item/cmd/ioctlgen/main.ha
+
+
+- Scan through the file looking for @\_IO\* constructs
+- Parse these @\_IO\* constructs
+- Determine the size of the type specified by the third parameter
+- Compute the ioctl number based on these inputs
+- Write the computed constant to the output
+- Pass everything else through unmodified
+
+The implementation begins thusly:
+
+```hare
+let ioctlre: regex::regex = regex::regex { ... };
+let typedefre: regex::regex = regex::regex { ... };
+
+@init fn init() void = {
+ ioctlre = regex::compile(`@(_IO[RW]*)\((.*)\)`)!;
+ typedefre = regex::compile(`^(export )?type `)!;
+};
+
+@fini fn fini() void = {
+ regex::finish(&ioctlre);
+ regex::finish(&typedefre);
+};
+```
+
+This sets aside two regular expressions: one that identifies type aliases (so
+that we can parse them to determine their size later), and one that identifies
+our @\_IO\* psuedo-macros. I also defined some types to store each of the
+details necessary to compute the ioctl assignment:
+
+```hare
+type dir = enum u32 {
+ IO = 0,
+ IOW = 1,
+ IOR = 2,
+ IOWR = IOW | IOR,
+};
+
+type ioctl = (dir, rune, u32, const nullable *types::_type);
+```
+
+Hare's standard library includes tools for parsing and analyzing Hare programs
+in the [hare namespace]. We'll need to use these to work with types in this
+program. At the start of the program, we initialize a "type store" from
+hare::types, which provides a mechanism with which Hare types can be processed
+and stored. The representation of Hare types varies depending on the
+architecture (for example, pointer types have different sizes on 32-bit and
+64-bit systems), so we have to specify the architecture we want. In the future
+it will be necessary to make this configurable, but for now I just hard-coded
+x86\_64:
+
+[hare namespace]: https://docs.harelang.org/hare
+
+```hare
+const store = types::store(types::x86_64, null, null);
+defer types::store_free(store);
+```
+
+The two "null" parameters are not going to be used here, but are designed to
+facilitate evaluating expressions in type definitions, such as `[8 * 16]int`.
+Leaving them null is permissible, but disables the ability to do this sort of
+thing.
+
+Following this, we enter a loop which processes the input file line-by-line,
+testing each line against our regular expressions and doing some logic on them
+if they match. Let's start with the code for handling new types:
+
+```hare
+for (true) {
+ const line = match (bufio::scanline(os::stdin)!) {
+ case io::EOF =>
+ break;
+ case let line: []u8 =>
+ yield strings::fromutf8(line);
+ };
+ defer free(line);
+
+ if (regex::test(&typedefre, line)!) {
+ bufio::unreadrune(os::stdin, '\n');
+ bufio::unread(os::stdin, strings::toutf8(line));
+ loadtype(store);
+ continue;
+ };
+
+ // ...to be continued...
+```
+
+If we encounter a line which matches our type declaration regular expression,
+then we unread that line back into the (buffered) standard input stream, then
+call this "loadtype" function to parse and load it into the type store.
+
+```ha
+fn loadtype(store: *types::typestore) void = {
+ const tee = io::tee(os::stdin, os::stdout);
+ const lex = lex::init(&tee, "<ioctl>");
+ const decl = match (parse::decl(&lex)) {
+ case let err: parse::error =>
+ fmt::fatal("Error parsing type declaration:",
+ parse::strerror(err));
+ case let decl: ast::decl =>
+ yield decl;
+ };
+
+ const tdecl = decl.decl as []ast::decl_type;
+ if (len(tdecl) != 1) {
+ fmt::fatal("Multiple type declarations are unsupported");
+ };
+ const tdecl = tdecl[0];
+ const of = types::lookup(store, &tdecl._type)!;
+ types::newalias(store, tdecl.ident, of);
+};
+```
+
+Hare includes a Hare lexer and parser in the standard library, which we're
+making use of here. The first thing we do is use [io::tee] to copy any data the
+parser reads into stdout, passing it through to the output file. Then we set up
+a lexer and parse the type declaration. A type declaration looks something like
+this:
+
+[io::tee]: https://docs.harelang.org/io#tee
+
+```hare
+type sock_fprog = struct {
+ length: u16,
+ filter: *sock_filter,
+};
+```
+
+The types::lookup call looks up the struct type, and newalias creates a new
+type alias based on that type with the given name (sock\_filter). Adding this to
+the type store will let us resolve the type when we encounter it later on, for
+example in this line:
+
+```hare
+def TUNGETFILTER: u32 = @_IOR('T', 219, sock_fprog);
+```
+
+Back to the main loop, we have another regex test to check if we're looking at a
+line with one of these psuedo-macros:
+
+```hare
+let groups = match (regex::find(&ioctlre, line)!) {
+case void =>
+ fmt::println(line)!;
+ continue;
+case let cap: []regex::capture =>
+ yield cap;
+};
+defer free(groups);
+
+const dir = switch (groups[1].content) {
+case "_IO" =>
+ yield dir::IO;
+case "_IOR" =>
+ yield dir::IOR;
+case "_IOW" =>
+ yield dir::IOW;
+case "_IOWR" =>
+ yield dir::IOWR;
+case =>
+ fmt::fatalf("Unknown ioctl direction {}", groups[1].content);
+};
+const ioctl = parseioctl(store, dir, groups[2].content);
+```
+
+Recall that the regex from earlier is `@(_IO[RW]*)\((.*)\)`. This has two
+capture groups: one for "\_IO" or "\_IOW" and so on, and another for the list of
+"parameters" (the zeroth "capture group" is the entire match string). We use the
+first capture group to grab the ioctl direction, then we pass that into
+"parseioctl" along with the type store and the second capture group.
+
+This "parseioctl" function is kind of neat:
+
+```hare
+fn parseioctl(store: *types::typestore, d: dir, params: str) ioctl = {
+ const buf = bufio::fixed(strings::toutf8(params), io::mode::READ);
+ const lex = lex::init(&buf, "<ioctl>");
+
+ const rn = expect(&lex, ltok::LIT_RUNE).1 as rune;
+ expect(&lex, ltok::COMMA);
+ const num = expect(&lex, ltok::LIT_ICONST).1 as i64;
+
+ if (d == dir::IO) {
+ return (d, rn, num: u32, null);
+ };
+
+ expect(&lex, ltok::COMMA);
+ const ty = match (parse::_type(&lex)) {
+ case let ty: ast::_type =>
+ yield ty;
+ case let err: parse::error =>
+ fmt::fatal("Error:", parse::strerror(err));
+ };
+
+ const ty = match (types::lookup(store, &ty)) {
+ case let err: types::error =>
+ fmt::fatal("Error:", types::strerror(err));
+ case types::deferred =>
+ fmt::fatal("Error: this tool does not support forward references");
+ case let ty: const *types::_type =>
+ yield ty;
+ };
+
+ return (d, rn, num: u32, ty);
+};
+
+fn expect(lex: *lex::lexer, want: ltok) lex::token = {
+ match (lex::lex(lex)) {
+ case let err: lex::error =>
+ fmt::fatal("Error:", lex::strerror(err));
+ case let tok: lex::token =>
+ if (tok.0 != want) {
+ fmt::fatalf("Error: unexpected {}", lex::tokstr(tok));
+ };
+ return tok;
+ };
+};
+```
+
+Here we've essentially set up a miniature parser based on a Hare lexer to parse
+our custom parameter list grammar. We create a [fixed reader] from the capture
+group string, then create a lexer based on this and start pulling tokens out of
+it. The first parameter is a rune, so we grab a LIT\_RUNE token and extract the
+Hare rune value from it, then after a COMMA token we repeat this with
+LIT\_ICONST to get the integer constant. dir::IO ioctls don't have a type
+parameter, so can return early in this case.
+
+[fixed reader]: https://docs.harelang.org/bufio#fixed
+
+Otherwise, we use [hare::parse::\_type] to parse the type parameter, producing
+a [hare::ast::\_type]. We then pass this to the type store to look up technical
+details about this type, such as its size, alignment, storage representation,
+and so on. This converts the AST type — which only has lexical information
+— into an actual type, including semantic information about the type.
+
+[hare::parse::\_type]: https://docs.harelang.org/hare/parse#_type
+[hare::ast::\_type]: https://docs.harelang.org/hare/ast#_type
+
+Equipped with this information, we can calculate the ioctl's assigned number:
+
+```hare
+def IOC_NRBITS: u32 = 8;
+def IOC_TYPEBITS: u32 = 8;
+def IOC_SIZEBITS: u32 = 14; // XXX: Arch-specific
+def IOC_DIRBITS: u32 = 2; // XXX: Arch-specific
+
+def IOC_NRSHIFT: u32 = 0;
+def IOC_TYPESHIFT: u32 = IOC_NRSHIFT + IOC_NRBITS;
+def IOC_SIZESHIFT: u32 = IOC_TYPESHIFT + IOC_TYPEBITS;
+def IOC_DIRSHIFT: u32 = IOC_SIZESHIFT + IOC_SIZEBITS;
+
+fn ioctlno(io: *ioctl) u32 = {
+ const typesz = match (io.3) {
+ case let ty: const *types::_type =>
+ yield ty.sz;
+ case null =>
+ yield 0z;
+ };
+ return (io.0: u32 << IOC_DIRSHIFT) |
+ (io.1: u32 << IOC_TYPESHIFT) |
+ (io.2 << IOC_NRSHIFT) |
+ (typesz: u32 << IOC_SIZESHIFT);
+};
+```
+
+And, back in the main loop, print it to the output:
+
+```hare
+const prefix = strings::sub(line, 0, groups[1].start - 1);
+fmt::printfln("{}0x{:x};", prefix, ioctlno(&ioctl))!;
+```
+
+Now we have successfully converted this:
+
+```hare
+type sock_filter = struct {
+ code: u16,
+ jt: u8,
+ jf: u8,
+ k: u32,
+};
+
+type sock_fprog = struct {
+ length: u16,
+ filter: *sock_filter,
+};
+
+def TUNATTACHFILTER: u32 = @_IOW('T', 213, sock_fprog);
+```
+
+Into this:
+
+```hare
+def TUNATTACHFILTER: u32 = 0x401054d5;
+```
+
+A quick C program verifies our result:
+
+```c
+#include <linux/ioctl.h>
+#include <linux/if_tun.h>
+#include <stdio.h>
+
+int main() {
+ printf("TUNATTACHFILTER: 0x%lx\n", TUNATTACHFILTER);
+}
+```
+
+And:
+
+```
+TUNATTACHFILTER: 0x401054d5
+```
+
+It works!
+
+---
+
+Critics may draw attention to the fact that we could have saved ourselves much
+of this work if Hare had first-class macros, but macros are not aligned with
+Hare's design goals, so an alternative solution is called for. This particular
+program is useful only in a small set of specific circumstances (and mainly for
+Hare developers themselves, less so for most users), but it solves the problem
+pretty neatly given the constraints it has to work within.
+
+I think this is a nice case study in a few useful features available from the
+Hare standard library. In addition to POSIX Extended Regular Expression support
+via the [regex] module, the [hare namespace] offers many tools to provide Hare
+programs with relatively deep insights into the language itself. We can use
+hare::lex to parse the custom grammar for our psuedo-macros, use hare::parse to
+parse type declarations, and use hare::types to compute the semantic details
+of each type. I also like many of the "little things" on display here, such as
+unreading data back into the buffered stdin reader, or using io::tee to copy
+data to stdout during parsing.
+
+[regex]: https://docs.harelang.org/regex
+
+I hope you found it interesting!