commit: db8cf886f742da56950d51fdabdda3982a6ba4bb
parent ab7ff2821dcb9747596aea7f55af8cf415ca6453
Author: Drew DeVault <sir@cmpwn.com>
Date: Sat, 8 Apr 2023 17:21:10 +0200
Drivers and mercury
Diffstat:
1 file changed, 614 insertions(+), 0 deletions(-)
diff --git a/content/blog/2023-04-08-Drivers-and-mercury.md b/content/blog/2023-04-08-Drivers-and-mercury.md
@@ -0,0 +1,614 @@
+---
+title: Writing Helios drivers in the Mercury driver environment
+date: 2023-04-08
+---
+
+*[Helios] is a microkernel written in the [Hare] programming language and is
+part of the larger [Ares](https://ares-os.org) operating system. You can watch
+my FOSDEM 2023 talk introducing Helios [on PeerTube][0].*
+
+[0]: https://spacepub.space/w/wpKXfhqqr7FajEAf4B2Vc2
+[Helios]: https://git.sr.ht/~sircmpwn/helios
+[Hare]: https://harelang.org
+
+Let's take a look at the new Mercury driver development environment for Helios.
+
+As you may remember from my FOSDEM talk, the Ares operating system is built out
+of several layers which provide progressively higher-level environments for an
+operating system. At the bottom is the Helios microkernel, and today we're going
+to talk about the second layer: the [Mercury] environment, which is used for
+writing and running device drivers in userspace. Let's take a look at a serial
+driver written against Mercury and introduce some of the primitives used by
+driver authors in the Mercury environment.
+
+[Mercury]: https://git.sr.ht/~sircmpwn/mercury
+
+Drivers for Mercury are written as normal ELF executables with an extra section
+called .manifest, which includes a file similar to the following (the provided
+example is for the serial driver we'll be examining today):
+
+```ini
+[driver]
+name=pcserial
+desc=Serial driver for x86_64 PCs
+
+[capabilities]
+0:ioport = min=3F8, max=400
+1:ioport = min=2E8, max=2F0
+2:note =
+3:irq = irq=3, note=2
+4:irq = irq=4, note=2
+_:cspace = self
+_:vspace = self
+_:memory = pages=32
+
+[services]
+devregistry=
+```
+
+Helios uses a capability-based design, in which access to system resources (such
+as I/O ports, IRQs, or memory) is governed by capability objects. Each process
+has a *capability space*, which is a table of capabilities assigned to that
+process, and when performing operations (such as writing to an I/O port) the
+user provides the index of the desired capability in a register when invoking
+the appropriate syscall.
+
+The manifest first specifies a list of capabilities required to operate the
+serial port. It requests, assigned static capability addresses, capabilities for
+the required I/O ports and IRQs, as well as a notification object which the IRQs
+will be delivered to. Some capability types, such as I/O ports, have
+configuration parameters, in this case the minimum and maximum port numbers
+which are relevant. The IRQ capabilities require a reference to a notification
+as well.
+
+Limiting access to these capabilities provides very strong isolation between
+device drivers. On a monolithic kernel like Linux, a bug in the serial driver
+could compromise the entire system, but a vulnerability in our driver could, at
+worst, write garbage to your serial port. This model also provides better
+security than something like OpenBSD's pledge by declaratively specifying what
+we need and nothing else.
+
+Following the statically allocated capabilities, we request our own capability
+space and virtual address space, the former so we can copy and destroy our
+capabilities, and the latter so that we can map shared memory to perform reads
+and writes for clients. We also request 32 pages of memory, which we use to
+allocate page tables to perform those mappings; this will be changed later.
+These capabilities do not require any specific address for the driver to work,
+so we use "\_" to indicate that any slot will suit our needs.
+
+Mercury uses some vendor extensions over the System-V ABI to communicate
+information about these capabilities to the runtime. Notes about each of the
+\_'d capabilities are provided by the auxiliary vector, and picked up by the
+Mercury runtime -- for instance, the presence of a memory capability is detected
+on startup and is used to set up the allocator; the presence of a vspace
+capability is automatically wired up to the mmap implementation.
+
+Each of these capabilities is implemented by the kernel, but additional services
+are available in userspace via endpoint capabilities. Each of these endpoints
+implements a particular API, as defined by a protocol definition file. This
+driver requires access to the device registry, so that it can create devices for
+its serial ports and expose them to clients.
+
+These protocol definitions are written in a domain-specific language and parsed
+by [ipcgen] to generate client and server implementations of each. Here's a
+simple protocol to start us off:
+
+[ipcgen]: https://git.sr.ht/~sircmpwn/ipcgen
+
+```
+namespace io;
+
+# The location with respect to which a seek operation is performed.
+enum whence {
+ # From the start of the file
+ SET,
+ # From the current offset
+ CUR,
+ # From the end of the file
+ END,
+};
+
+# An object with file-like semantics.
+interface file {
+ # Reads up to amt bytes of data from a file.
+ call read{pages: page...}(buf: uintptr, amt: size) size;
+
+ # Writes up to amt bytes of data to a file.
+ call write{pages: page...}(buf: uintptr, amt: size) size;
+
+ # Seeks a file to a given offset, returning the new offset.
+ call seek(offs: i64, w: whence) size;
+};
+```
+
+Each interface includes a list of methods, each of which can take a number of
+capabilities and parameters, and return a value. The "read" call here, when
+implemented by a file-like object, accepts a list of memory pages to perform the
+read or write with (shared memory), as well as a pointer to the buffer address
+and size. Error handling is still a to-do.
+
+ipcgen consumes these files and writes client or server code as appropriate.
+These are generated as part of the Mercury build process and end up in
+\*\_gen.ha files. The generated client code is filed away into the relevant
+modules (this protocol ends up at io/file\_gen.ha), alongside various
+hand-written files which provide additional functionality and often wrap the IPC
+calls in a higher-level interface. The server implementations end up in the
+"serv" module, e.g. serv/io/file\_gen.ha.
+
+Let's look at some of the generated client code for io::file objects:
+
+```hare
+// This file was generated by ipcgen; do not modify by hand
+use helios;
+use rt;
+
+// ID for the file IPC interface.
+export def FILE_ID: u32 = 0x9A533BB3;
+
+// Labels for operations against file objects.
+export type file_label = enum u64 {
+ READ = FILE_ID << 16u64 | 1,
+ WRITE = FILE_ID << 16u64 | 2,
+ SEEK = FILE_ID << 16u64 | 3,
+};
+
+export fn file_read(
+ ep: helios::cap,
+ pages: []helios::cap,
+ buf: uintptr,
+ amt: size,
+) size = {
+ // ...
+};
+```
+
+Each interface has a unique ID (generated from the FNV-1a hash of its fully
+qualified name), which is bitwise-OR'd with a list of operations to form call
+labels. The interface ID is used elsewhere; we'll refer to it again later. Then
+each method generates an implementation which arranges the IPC details as
+necessary and invokes the "call" syscall against the endpoint capability.
+
+The generated server code is a bit more involved. Some of the details are
+similar -- FILE\_ID is generated again, for instance -- but there are some
+additional details as well. First is the generation of a vtable defining the
+functions implementing each operation:
+
+```hare
+// Implementation of a [[file]] object.
+export type file_iface = struct {
+ read: *fn_file_read,
+ write: *fn_file_write,
+ seek: *fn_file_seek,
+};
+```
+
+We also define a file object which is subtyped by the implementation to store
+implementation details, and which provides to the generated code the required
+bits of state.
+
+```hare
+// Instance of an file object. Users may subtype this object to add
+// instance-specific state.
+export type file = struct {
+ _iface: *file_iface,
+ _endpoint: helios::cap,
+};
+```
+
+Here's an example of a subtype of file used by the initramfs to store additional
+state:
+
+```hare
+// An open file in the bootstrap filesystem
+type bfs_file = struct {
+ serv::io::file,
+ fs: *bfs,
+ ent: tar::entry,
+ cur: io::off,
+ padding: size,
+};
+```
+
+The embedded serv::io::file structure here is populated with an implementation
+of file\_iface, here simplified for illustrative purposes:
+
+```hare
+const bfs_file_impl = serv_io::file_iface {
+ read = &bfs_file_read,
+ write = &bfs_file_write,
+ seek = &bfs_file_seek,
+};
+
+fn bfs_file_read(
+ obj: *serv_io::file,
+ pages: []helios::cap,
+ buf: uintptr,
+ amt: size,
+) size = {
+ let file = obj: *bfs_file;
+ const fs = file.fs;
+ const offs = (buf & rt::PAGEMASK): size;
+ defer helios::destroy(pages...)!;
+
+ assert(offs + amt <= len(pages) * rt::PAGESIZE);
+ const buf = helios::map(rt::vspace, 0, map_flags::W, pages...)!: *[*]u8;
+
+ let buf = buf[offs..offs+amt];
+ // Not shown: reading the file data into this buffer
+};
+```
+
+The implementation can prepare a file object and call dispatch on it to process
+client requests: this function blocks until a request arrives, decodes it, and
+invokes the appropriate function. Often this is incorporated into an event loop
+with poll to service many objects at once.
+
+```hare
+// Prepare a file object
+const ep = helios::newendpoint()!;
+append(fs.files, bfs_file {
+ _iface = &bfs_file_impl,
+ _endpoint = ep,
+ fs = fs,
+ ent = ent,
+ cur = io::tell(fs.buf)!,
+ padding = fs.rd.padding,
+});
+
+// ...
+
+// Process requests associated with this file
+serv::io::file_dispatch(file);
+```
+
+Okay, enough background: back to the serial driver. It needs to implement the
+following protocol:
+
+```
+namespace dev;
+use io;
+
+# TODO: Add busy error and narrow semantics
+
+# Note: TWO is interpreted as 1.5 for some char lengths (5)
+enum stop_bits {
+ ONE,
+ TWO,
+};
+
+enum parity {
+ NONE,
+ ODD,
+ EVEN,
+ MARK,
+ SPACE,
+};
+
+# A serial device, which implements the file interface for reading from and
+# writing to a serial port. Typical implementations may only support one read
+# in-flight at a time, returning errors::busy otherwise.
+interface serial :: io::file {
+ # Returns the baud rate in Hz.
+ call get_baud() uint;
+
+ # Returns the configured number of bits per character.
+ call get_charlen() uint;
+
+ # Returns the configured number of stop bits.
+ call get_stopbits() stop_bits;
+
+ # Returns the configured parity setting.
+ call get_parity() parity;
+
+ # Sets the baud rate in Hz.
+ call set_baud(hz: uint) void;
+
+ # Sets the number of bits per character. Must be 5, 6, 7, or 8.
+ call set_charlen(bits: uint) void;
+
+ # Configures the number of stop bits to use.
+ call set_stopbits(bits: stop_bits) void;
+
+ # Configures the desired parity.
+ call set_parity(parity: parity) void;
+};
+```
+
+This protocol *inherits* the io::file interface, so the serial port is usable
+like any other file for reads and writes. It additionally defines
+serial-specific methods, such as configuring the baud rate or parity. The
+generated interface we'll have to implement looks something like this, embedding
+the io::file\_iface struct:
+
+```hare
+export type serial_iface = struct {
+ io::file_iface,
+ get_baud: *fn_serial_get_baud,
+ get_charlen: *fn_serial_get_charlen,
+ get_stopbits: *fn_serial_get_stopbits,
+ get_parity: *fn_serial_get_parity,
+ set_baud: *fn_serial_set_baud,
+ set_charlen: *fn_serial_set_charlen,
+ set_stopbits: *fn_serial_set_stopbits,
+ set_parity: *fn_serial_set_parity,
+}
+```
+
+Time to dive into the implementation. Recall the driver manifest, which provides
+the serial driver with a suitable environment:
+
+```
+[driver]
+name=pcserial
+desc=Serial driver for x86_64 PCs
+
+[capabilities]
+0:ioport = min=3F8, max=400
+1:ioport = min=2E8, max=2F0
+2:note =
+3:irq = irq=3, note=2
+4:irq = irq=4, note=2
+_:cspace = self
+_:vspace = self
+_:memory = pages=32
+
+[services]
+devregistry=
+```
+
+I/O ports for reading and writing to the serial devices, IRQs for receiving
+serial-related interrupts, a device registry to add our serial devices to the
+system, and a few extra things for implementation needs. Some of these are
+statically allocated, some of them are provided via the auxiliary vector.
+Our [serial driver][driver] opens by defining constants for the statically
+allocated capabilities:
+
+[driver]: https://git.sr.ht/~sircmpwn/mercury/tree/5e12977a0cb773331b9b3b8421da63b85eed232c/item/cmd/serial
+
+```hare
+def IOPORT_A: helios::cap = 0;
+def IOPORT_B: helios::cap = 1;
+def IRQ: helios::cap = 2;
+def IRQ3: helios::cap = 3;
+def IRQ4: helios::cap = 4;
+```
+
+The first thing we do on startup is create a serial device.
+
+```hare
+export fn main() void = {
+ let serial0: helios::cap = 0;
+ const registry = helios::service(sys::DEVREGISTRY_ID);
+ sys::devregistry_new(registry, dev::SERIAL_ID, &serial0);
+ helios::destroy(registry)!;
+ // ...
+```
+
+The device registry is provided via the aux vector, and we can use
+helios::service to look it up by its interface ID. Then we use the
+devregistry::new operation to create a serial device:
+
+```
+# Device driver registry.
+interface devregistry {
+ # Creates a new device implementing the given interface ID using the
+ # provided endpoint capability and returns its assigned serial number.
+ call new{; out}(iface: u64) uint;
+};
+```
+
+After this we can destroy the registry -- we won't need it again and it's best
+to get rid of it so that we can work with the minimum possible privileges at
+runtime. After this we initialize the serial port, acknowledge any interrupts
+that might have been pending before we got started, an enter the main loop.
+
+```hare
+com_init(&ports[0], serial0);
+
+helios::irq_ack(IRQ3)!;
+helios::irq_ack(IRQ4)!;
+
+let poll: [_]pollcap = [
+ pollcap { cap = IRQ, events = pollflags::RECV, ... },
+ pollcap { cap = serial0, events = pollflags::RECV, ... },
+];
+for (true) {
+ helios::poll(poll)!;
+ if (poll[0].revents & pollflags::RECV != 0) {
+ dispatch_irq();
+ };
+ if (poll[1].revents & pollflags::RECV != 0) {
+ dispatch_serial(&ports[0]);
+ };
+};
+```
+
+The dispatch\_serial function is of interest, as this provides the
+implementation of the serial object we just created with the device registry.
+
+```hare
+type comport = struct {
+ dev::serial,
+ port: u16,
+ rbuf: [4096]u8,
+ wbuf: [4096]u8,
+ rpending: []u8,
+ wpending: []u8,
+};
+
+fn dispatch_serial(dev: *comport) void = {
+ dev::serial_dispatch(dev);
+};
+
+const serial_impl = dev::serial_iface {
+ read = &serial_read,
+ write = &serial_write,
+ seek = &serial_seek,
+ get_baud = &serial_get_baud,
+ get_charlen = &serial_get_charlen,
+ get_stopbits = &serial_get_stopbits,
+ get_parity = &serial_get_parity,
+ set_baud = &serial_set_baud,
+ set_charlen = &serial_set_charlen,
+ set_stopbits = &serial_set_stopbits,
+ set_parity = &serial_set_parity,
+};
+
+fn serial_read(
+ obj: *io::file,
+ pages: []helios::cap,
+ buf: uintptr,
+ amt: size,
+) size = {
+ const port = obj: *comport;
+ const offs = (buf & rt::PAGEMASK): size;
+ const buf = helios::map(rt::vspace, 0, map_flags::W, pages...)!: *[*]u8;
+ const buf = buf[offs..offs+amt];
+
+ if (len(port.rpending) != 0) {
+ defer helios::destroy(pages...)!;
+ return rconsume(port, buf);
+ };
+
+ pages_static[..len(pages)] = pages[..];
+ pending_read = read {
+ reply = helios::store_reply(helios::CADDR_UNDEF)!,
+ pages = pages_static[..len(pages)],
+ buf = buf,
+ };
+ return 0;
+};
+
+// (other functions omitted)
+```
+
+We'll skip much of the implementation details for this specific driver, but I'll
+show you how read works at least. It's relatively straightforward: first we mmap
+the buffer provided by the caller. If there's already readable data pending from
+the serial port (stored in that rpending slice in the comport struct, which is a
+slice of the statically-allocated rbuf field), we copy it into the buffer and
+return the number of bytes we had ready. Otherwise, we stash details about the
+caller, storing the special reply capability in our cspace (this is one of the
+reasons we need cspace = self in our manifest) so we can reply to this call
+once data is available. Then we return to the main loop.
+
+The main loop also wakes up on an interrupt, and we have an interrupt unmasked
+on the serial device to wake us whenever there's data ready to be read.
+Eventually this gets us here, which finishes the call we saved earlier:
+
+```hare
+// Reads data from the serial port's RX FIFO.
+fn com_read(com: *comport) size = {
+ let n: size = 0;
+ for (comin(com.port, LSR) & RBF == RBF; n += 1) {
+ const ch = comin(com.port, RBR);
+ if (len(com.rpending) < len(com.rbuf)) {
+ // If the buffer is full we just drop chars
+ static append(com.rpending, ch);
+ };
+ };
+
+ if (pending_read.reply != 0) {
+ const n = rconsume(com, pending_read.buf);
+ helios::send(pending_read.reply, 0, n)!;
+ pending_read.reply = 0;
+ helios::destroy(pending_read.pages...)!;
+ };
+
+ return n;
+};
+```
+
+I hope that gives you a general idea of how drivers work in this environment!
+I encourage you to read the full implementation if you're curious to know more
+about the serial driver in particular -- it's just 370 lines of code.
+
+The last thing I want to show you is how the driver gets executed in the first
+place. When Helios boots up, it starts /sbin/sysinit, which is provided by
+Mercury and offers various low-level userspace runtime services, such as the
+device registry and bootstrap filesystem we saw earlier. After setting up its
+services, sysinit executes /sbin/usrinit, which is provided by the next layer
+up (Gaia, eventually) and sets up the rest of the system according to user
+policy, mounting filesystems and starting up drivers and such. At the moment,
+usrinit is fairly simple, and just runs a little demo. Here it is in full:
+
+```hare
+use dev;
+use fs;
+use helios;
+use io;
+use log;
+use rt;
+use sys;
+
+export fn main() void = {
+ const fs = helios::service(fs::FS_ID);
+ const procmgr = helios::service(sys::PROCMGR_ID);
+ const devmgr = helios::service(sys::DEVMGR_ID);
+ const devload = helios::service(sys::DEVLOADER_ID);
+
+ log::printfln("[usrinit] Running /sbin/drv/serial");
+ let proc: helios::cap = 0;
+ const image = fs::open(fs, "/sbin/drv/serial")!;
+ sys::procmgr_new(procmgr, &proc);
+ sys::devloader_load(devload, proc, image);
+ sys::process_start(proc);
+
+ let serial: helios::cap = 0;
+ log::printfln("[usrinit] open device serial0");
+ sys::devmgr_open(devmgr, dev::SERIAL_ID, 0, &serial);
+
+ let buf: [rt::PAGESIZE]u8 = [0...];
+ for (true) {
+ const n = match (io::read(serial, buf)!) {
+ case let n: size =>
+ yield n;
+ case io::EOF =>
+ break;
+ };
+
+ // CR => LF
+ for (let i = 0z; i < n; i += 1) {
+ if (buf[i] == '\r') {
+ buf[i] = '\n';
+ };
+ };
+
+ // echo
+ io::write(serial, buf[..n])!;
+ };
+};
+```
+
+Each of the services shown at the start are automatically provided in usrinit's
+aux vector by sysinit, and includes all of the services required to bootstrap
+the system. This includes a filesystem (the initramfs), a process manager (to
+start up new processes), the device manager, and the driver loader service.
+
+usrinit starts by opening up /sbin/drv/serial (the serial driver, of course)
+from the provided initramfs using fs::open, which is a convenience wrapper
+around the filesystem protocol. Then we create a new process with the process
+manager, which by default has an empty address space -- we could load a normal
+process into it with sys::process\_load, but we want to load a driver, so we
+use the devloader interface instead. Then we start the process and boom: the
+serial driver is online.
+
+The serial driver registers itself with the device registry, which means that we
+can use the device manager to open the 0th device which implements the serial
+interface. Since this is compatible with the io::file interface, it can simply
+be used normally with io::read and io::write to utilize the serial port. The
+main loop simply echos data read from the serial port back out. Simple!
+
+---
+
+That's a quick introduction to the driver environment provided by Mercury. I
+intend to write a few more drivers soon myself -- PC keyboard, framebuffer,
+etc -- and set up a simple shell. We have seen a few sample drivers written
+pre-Mercury which would be nice to bring into this environment, such as virtio
+networking and block devices. It will be nice to see them re-introduced in an
+environment where they can provide useful services to the rest of userspace.
+
+If you're interested in learning more about Helios or Mercury, consult
+[ares-os.org](https://ares-os.org) for documentation -- though beware of the
+many stub pages. If you have any questions or want to get involved in writing
+some drivers yourself, jump into our IRC channel: #helios on Libera Chat.