logo

drewdevault.com

[mirror] blog and personal website of Drew DeVault git clone https://hacktivis.me/git/mirror/drewdevault.com.git
commit: db8cf886f742da56950d51fdabdda3982a6ba4bb
parent ab7ff2821dcb9747596aea7f55af8cf415ca6453
Author: Drew DeVault <sir@cmpwn.com>
Date:   Sat,  8 Apr 2023 17:21:10 +0200

Drivers and mercury

Diffstat:

Acontent/blog/2023-04-08-Drivers-and-mercury.md614+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 614 insertions(+), 0 deletions(-)

diff --git a/content/blog/2023-04-08-Drivers-and-mercury.md b/content/blog/2023-04-08-Drivers-and-mercury.md @@ -0,0 +1,614 @@ +--- +title: Writing Helios drivers in the Mercury driver environment +date: 2023-04-08 +--- + +*[Helios] is a microkernel written in the [Hare] programming language and is +part of the larger [Ares](https://ares-os.org) operating system. You can watch +my FOSDEM 2023 talk introducing Helios [on PeerTube][0].* + +[0]: https://spacepub.space/w/wpKXfhqqr7FajEAf4B2Vc2 +[Helios]: https://git.sr.ht/~sircmpwn/helios +[Hare]: https://harelang.org + +Let's take a look at the new Mercury driver development environment for Helios. + +As you may remember from my FOSDEM talk, the Ares operating system is built out +of several layers which provide progressively higher-level environments for an +operating system. At the bottom is the Helios microkernel, and today we're going +to talk about the second layer: the [Mercury] environment, which is used for +writing and running device drivers in userspace. Let's take a look at a serial +driver written against Mercury and introduce some of the primitives used by +driver authors in the Mercury environment. + +[Mercury]: https://git.sr.ht/~sircmpwn/mercury + +Drivers for Mercury are written as normal ELF executables with an extra section +called .manifest, which includes a file similar to the following (the provided +example is for the serial driver we'll be examining today): + +```ini +[driver] +name=pcserial +desc=Serial driver for x86_64 PCs + +[capabilities] +0:ioport = min=3F8, max=400 +1:ioport = min=2E8, max=2F0 +2:note = +3:irq = irq=3, note=2 +4:irq = irq=4, note=2 +_:cspace = self +_:vspace = self +_:memory = pages=32 + +[services] +devregistry= +``` + +Helios uses a capability-based design, in which access to system resources (such +as I/O ports, IRQs, or memory) is governed by capability objects. Each process +has a *capability space*, which is a table of capabilities assigned to that +process, and when performing operations (such as writing to an I/O port) the +user provides the index of the desired capability in a register when invoking +the appropriate syscall. + +The manifest first specifies a list of capabilities required to operate the +serial port. It requests, assigned static capability addresses, capabilities for +the required I/O ports and IRQs, as well as a notification object which the IRQs +will be delivered to. Some capability types, such as I/O ports, have +configuration parameters, in this case the minimum and maximum port numbers +which are relevant. The IRQ capabilities require a reference to a notification +as well. + +Limiting access to these capabilities provides very strong isolation between +device drivers. On a monolithic kernel like Linux, a bug in the serial driver +could compromise the entire system, but a vulnerability in our driver could, at +worst, write garbage to your serial port. This model also provides better +security than something like OpenBSD's pledge by declaratively specifying what +we need and nothing else. + +Following the statically allocated capabilities, we request our own capability +space and virtual address space, the former so we can copy and destroy our +capabilities, and the latter so that we can map shared memory to perform reads +and writes for clients. We also request 32 pages of memory, which we use to +allocate page tables to perform those mappings; this will be changed later. +These capabilities do not require any specific address for the driver to work, +so we use "\_" to indicate that any slot will suit our needs. + +Mercury uses some vendor extensions over the System-V ABI to communicate +information about these capabilities to the runtime. Notes about each of the +\_'d capabilities are provided by the auxiliary vector, and picked up by the +Mercury runtime -- for instance, the presence of a memory capability is detected +on startup and is used to set up the allocator; the presence of a vspace +capability is automatically wired up to the mmap implementation. + +Each of these capabilities is implemented by the kernel, but additional services +are available in userspace via endpoint capabilities. Each of these endpoints +implements a particular API, as defined by a protocol definition file. This +driver requires access to the device registry, so that it can create devices for +its serial ports and expose them to clients. + +These protocol definitions are written in a domain-specific language and parsed +by [ipcgen] to generate client and server implementations of each. Here's a +simple protocol to start us off: + +[ipcgen]: https://git.sr.ht/~sircmpwn/ipcgen + +``` +namespace io; + +# The location with respect to which a seek operation is performed. +enum whence { + # From the start of the file + SET, + # From the current offset + CUR, + # From the end of the file + END, +}; + +# An object with file-like semantics. +interface file { + # Reads up to amt bytes of data from a file. + call read{pages: page...}(buf: uintptr, amt: size) size; + + # Writes up to amt bytes of data to a file. + call write{pages: page...}(buf: uintptr, amt: size) size; + + # Seeks a file to a given offset, returning the new offset. + call seek(offs: i64, w: whence) size; +}; +``` + +Each interface includes a list of methods, each of which can take a number of +capabilities and parameters, and return a value. The "read" call here, when +implemented by a file-like object, accepts a list of memory pages to perform the +read or write with (shared memory), as well as a pointer to the buffer address +and size. Error handling is still a to-do. + +ipcgen consumes these files and writes client or server code as appropriate. +These are generated as part of the Mercury build process and end up in +\*\_gen.ha files. The generated client code is filed away into the relevant +modules (this protocol ends up at io/file\_gen.ha), alongside various +hand-written files which provide additional functionality and often wrap the IPC +calls in a higher-level interface. The server implementations end up in the +"serv" module, e.g. serv/io/file\_gen.ha. + +Let's look at some of the generated client code for io::file objects: + +```hare +// This file was generated by ipcgen; do not modify by hand +use helios; +use rt; + +// ID for the file IPC interface. +export def FILE_ID: u32 = 0x9A533BB3; + +// Labels for operations against file objects. +export type file_label = enum u64 { + READ = FILE_ID << 16u64 | 1, + WRITE = FILE_ID << 16u64 | 2, + SEEK = FILE_ID << 16u64 | 3, +}; + +export fn file_read( + ep: helios::cap, + pages: []helios::cap, + buf: uintptr, + amt: size, +) size = { + // ... +}; +``` + +Each interface has a unique ID (generated from the FNV-1a hash of its fully +qualified name), which is bitwise-OR'd with a list of operations to form call +labels. The interface ID is used elsewhere; we'll refer to it again later. Then +each method generates an implementation which arranges the IPC details as +necessary and invokes the "call" syscall against the endpoint capability. + +The generated server code is a bit more involved. Some of the details are +similar -- FILE\_ID is generated again, for instance -- but there are some +additional details as well. First is the generation of a vtable defining the +functions implementing each operation: + +```hare +// Implementation of a [[file]] object. +export type file_iface = struct { + read: *fn_file_read, + write: *fn_file_write, + seek: *fn_file_seek, +}; +``` + +We also define a file object which is subtyped by the implementation to store +implementation details, and which provides to the generated code the required +bits of state. + +```hare +// Instance of an file object. Users may subtype this object to add +// instance-specific state. +export type file = struct { + _iface: *file_iface, + _endpoint: helios::cap, +}; +``` + +Here's an example of a subtype of file used by the initramfs to store additional +state: + +```hare +// An open file in the bootstrap filesystem +type bfs_file = struct { + serv::io::file, + fs: *bfs, + ent: tar::entry, + cur: io::off, + padding: size, +}; +``` + +The embedded serv::io::file structure here is populated with an implementation +of file\_iface, here simplified for illustrative purposes: + +```hare +const bfs_file_impl = serv_io::file_iface { + read = &bfs_file_read, + write = &bfs_file_write, + seek = &bfs_file_seek, +}; + +fn bfs_file_read( + obj: *serv_io::file, + pages: []helios::cap, + buf: uintptr, + amt: size, +) size = { + let file = obj: *bfs_file; + const fs = file.fs; + const offs = (buf & rt::PAGEMASK): size; + defer helios::destroy(pages...)!; + + assert(offs + amt <= len(pages) * rt::PAGESIZE); + const buf = helios::map(rt::vspace, 0, map_flags::W, pages...)!: *[*]u8; + + let buf = buf[offs..offs+amt]; + // Not shown: reading the file data into this buffer +}; +``` + +The implementation can prepare a file object and call dispatch on it to process +client requests: this function blocks until a request arrives, decodes it, and +invokes the appropriate function. Often this is incorporated into an event loop +with poll to service many objects at once. + +```hare +// Prepare a file object +const ep = helios::newendpoint()!; +append(fs.files, bfs_file { + _iface = &bfs_file_impl, + _endpoint = ep, + fs = fs, + ent = ent, + cur = io::tell(fs.buf)!, + padding = fs.rd.padding, +}); + +// ... + +// Process requests associated with this file +serv::io::file_dispatch(file); +``` + +Okay, enough background: back to the serial driver. It needs to implement the +following protocol: + +``` +namespace dev; +use io; + +# TODO: Add busy error and narrow semantics + +# Note: TWO is interpreted as 1.5 for some char lengths (5) +enum stop_bits { + ONE, + TWO, +}; + +enum parity { + NONE, + ODD, + EVEN, + MARK, + SPACE, +}; + +# A serial device, which implements the file interface for reading from and +# writing to a serial port. Typical implementations may only support one read +# in-flight at a time, returning errors::busy otherwise. +interface serial :: io::file { + # Returns the baud rate in Hz. + call get_baud() uint; + + # Returns the configured number of bits per character. + call get_charlen() uint; + + # Returns the configured number of stop bits. + call get_stopbits() stop_bits; + + # Returns the configured parity setting. + call get_parity() parity; + + # Sets the baud rate in Hz. + call set_baud(hz: uint) void; + + # Sets the number of bits per character. Must be 5, 6, 7, or 8. + call set_charlen(bits: uint) void; + + # Configures the number of stop bits to use. + call set_stopbits(bits: stop_bits) void; + + # Configures the desired parity. + call set_parity(parity: parity) void; +}; +``` + +This protocol *inherits* the io::file interface, so the serial port is usable +like any other file for reads and writes. It additionally defines +serial-specific methods, such as configuring the baud rate or parity. The +generated interface we'll have to implement looks something like this, embedding +the io::file\_iface struct: + +```hare +export type serial_iface = struct { + io::file_iface, + get_baud: *fn_serial_get_baud, + get_charlen: *fn_serial_get_charlen, + get_stopbits: *fn_serial_get_stopbits, + get_parity: *fn_serial_get_parity, + set_baud: *fn_serial_set_baud, + set_charlen: *fn_serial_set_charlen, + set_stopbits: *fn_serial_set_stopbits, + set_parity: *fn_serial_set_parity, +} +``` + +Time to dive into the implementation. Recall the driver manifest, which provides +the serial driver with a suitable environment: + +``` +[driver] +name=pcserial +desc=Serial driver for x86_64 PCs + +[capabilities] +0:ioport = min=3F8, max=400 +1:ioport = min=2E8, max=2F0 +2:note = +3:irq = irq=3, note=2 +4:irq = irq=4, note=2 +_:cspace = self +_:vspace = self +_:memory = pages=32 + +[services] +devregistry= +``` + +I/O ports for reading and writing to the serial devices, IRQs for receiving +serial-related interrupts, a device registry to add our serial devices to the +system, and a few extra things for implementation needs. Some of these are +statically allocated, some of them are provided via the auxiliary vector. +Our [serial driver][driver] opens by defining constants for the statically +allocated capabilities: + +[driver]: https://git.sr.ht/~sircmpwn/mercury/tree/5e12977a0cb773331b9b3b8421da63b85eed232c/item/cmd/serial + +```hare +def IOPORT_A: helios::cap = 0; +def IOPORT_B: helios::cap = 1; +def IRQ: helios::cap = 2; +def IRQ3: helios::cap = 3; +def IRQ4: helios::cap = 4; +``` + +The first thing we do on startup is create a serial device. + +```hare +export fn main() void = { + let serial0: helios::cap = 0; + const registry = helios::service(sys::DEVREGISTRY_ID); + sys::devregistry_new(registry, dev::SERIAL_ID, &serial0); + helios::destroy(registry)!; + // ... +``` + +The device registry is provided via the aux vector, and we can use +helios::service to look it up by its interface ID. Then we use the +devregistry::new operation to create a serial device: + +``` +# Device driver registry. +interface devregistry { + # Creates a new device implementing the given interface ID using the + # provided endpoint capability and returns its assigned serial number. + call new{; out}(iface: u64) uint; +}; +``` + +After this we can destroy the registry -- we won't need it again and it's best +to get rid of it so that we can work with the minimum possible privileges at +runtime. After this we initialize the serial port, acknowledge any interrupts +that might have been pending before we got started, an enter the main loop. + +```hare +com_init(&ports[0], serial0); + +helios::irq_ack(IRQ3)!; +helios::irq_ack(IRQ4)!; + +let poll: [_]pollcap = [ + pollcap { cap = IRQ, events = pollflags::RECV, ... }, + pollcap { cap = serial0, events = pollflags::RECV, ... }, +]; +for (true) { + helios::poll(poll)!; + if (poll[0].revents & pollflags::RECV != 0) { + dispatch_irq(); + }; + if (poll[1].revents & pollflags::RECV != 0) { + dispatch_serial(&ports[0]); + }; +}; +``` + +The dispatch\_serial function is of interest, as this provides the +implementation of the serial object we just created with the device registry. + +```hare +type comport = struct { + dev::serial, + port: u16, + rbuf: [4096]u8, + wbuf: [4096]u8, + rpending: []u8, + wpending: []u8, +}; + +fn dispatch_serial(dev: *comport) void = { + dev::serial_dispatch(dev); +}; + +const serial_impl = dev::serial_iface { + read = &serial_read, + write = &serial_write, + seek = &serial_seek, + get_baud = &serial_get_baud, + get_charlen = &serial_get_charlen, + get_stopbits = &serial_get_stopbits, + get_parity = &serial_get_parity, + set_baud = &serial_set_baud, + set_charlen = &serial_set_charlen, + set_stopbits = &serial_set_stopbits, + set_parity = &serial_set_parity, +}; + +fn serial_read( + obj: *io::file, + pages: []helios::cap, + buf: uintptr, + amt: size, +) size = { + const port = obj: *comport; + const offs = (buf & rt::PAGEMASK): size; + const buf = helios::map(rt::vspace, 0, map_flags::W, pages...)!: *[*]u8; + const buf = buf[offs..offs+amt]; + + if (len(port.rpending) != 0) { + defer helios::destroy(pages...)!; + return rconsume(port, buf); + }; + + pages_static[..len(pages)] = pages[..]; + pending_read = read { + reply = helios::store_reply(helios::CADDR_UNDEF)!, + pages = pages_static[..len(pages)], + buf = buf, + }; + return 0; +}; + +// (other functions omitted) +``` + +We'll skip much of the implementation details for this specific driver, but I'll +show you how read works at least. It's relatively straightforward: first we mmap +the buffer provided by the caller. If there's already readable data pending from +the serial port (stored in that rpending slice in the comport struct, which is a +slice of the statically-allocated rbuf field), we copy it into the buffer and +return the number of bytes we had ready. Otherwise, we stash details about the +caller, storing the special reply capability in our cspace (this is one of the +reasons we need cspace = self in our manifest) so we can reply to this call +once data is available. Then we return to the main loop. + +The main loop also wakes up on an interrupt, and we have an interrupt unmasked +on the serial device to wake us whenever there's data ready to be read. +Eventually this gets us here, which finishes the call we saved earlier: + +```hare +// Reads data from the serial port's RX FIFO. +fn com_read(com: *comport) size = { + let n: size = 0; + for (comin(com.port, LSR) & RBF == RBF; n += 1) { + const ch = comin(com.port, RBR); + if (len(com.rpending) < len(com.rbuf)) { + // If the buffer is full we just drop chars + static append(com.rpending, ch); + }; + }; + + if (pending_read.reply != 0) { + const n = rconsume(com, pending_read.buf); + helios::send(pending_read.reply, 0, n)!; + pending_read.reply = 0; + helios::destroy(pending_read.pages...)!; + }; + + return n; +}; +``` + +I hope that gives you a general idea of how drivers work in this environment! +I encourage you to read the full implementation if you're curious to know more +about the serial driver in particular -- it's just 370 lines of code. + +The last thing I want to show you is how the driver gets executed in the first +place. When Helios boots up, it starts /sbin/sysinit, which is provided by +Mercury and offers various low-level userspace runtime services, such as the +device registry and bootstrap filesystem we saw earlier. After setting up its +services, sysinit executes /sbin/usrinit, which is provided by the next layer +up (Gaia, eventually) and sets up the rest of the system according to user +policy, mounting filesystems and starting up drivers and such. At the moment, +usrinit is fairly simple, and just runs a little demo. Here it is in full: + +```hare +use dev; +use fs; +use helios; +use io; +use log; +use rt; +use sys; + +export fn main() void = { + const fs = helios::service(fs::FS_ID); + const procmgr = helios::service(sys::PROCMGR_ID); + const devmgr = helios::service(sys::DEVMGR_ID); + const devload = helios::service(sys::DEVLOADER_ID); + + log::printfln("[usrinit] Running /sbin/drv/serial"); + let proc: helios::cap = 0; + const image = fs::open(fs, "/sbin/drv/serial")!; + sys::procmgr_new(procmgr, &proc); + sys::devloader_load(devload, proc, image); + sys::process_start(proc); + + let serial: helios::cap = 0; + log::printfln("[usrinit] open device serial0"); + sys::devmgr_open(devmgr, dev::SERIAL_ID, 0, &serial); + + let buf: [rt::PAGESIZE]u8 = [0...]; + for (true) { + const n = match (io::read(serial, buf)!) { + case let n: size => + yield n; + case io::EOF => + break; + }; + + // CR => LF + for (let i = 0z; i < n; i += 1) { + if (buf[i] == '\r') { + buf[i] = '\n'; + }; + }; + + // echo + io::write(serial, buf[..n])!; + }; +}; +``` + +Each of the services shown at the start are automatically provided in usrinit's +aux vector by sysinit, and includes all of the services required to bootstrap +the system. This includes a filesystem (the initramfs), a process manager (to +start up new processes), the device manager, and the driver loader service. + +usrinit starts by opening up /sbin/drv/serial (the serial driver, of course) +from the provided initramfs using fs::open, which is a convenience wrapper +around the filesystem protocol. Then we create a new process with the process +manager, which by default has an empty address space -- we could load a normal +process into it with sys::process\_load, but we want to load a driver, so we +use the devloader interface instead. Then we start the process and boom: the +serial driver is online. + +The serial driver registers itself with the device registry, which means that we +can use the device manager to open the 0th device which implements the serial +interface. Since this is compatible with the io::file interface, it can simply +be used normally with io::read and io::write to utilize the serial port. The +main loop simply echos data read from the serial port back out. Simple! + +--- + +That's a quick introduction to the driver environment provided by Mercury. I +intend to write a few more drivers soon myself -- PC keyboard, framebuffer, +etc -- and set up a simple shell. We have seen a few sample drivers written +pre-Mercury which would be nice to bring into this environment, such as virtio +networking and block devices. It will be nice to see them re-introduced in an +environment where they can provide useful services to the rest of userspace. + +If you're interested in learning more about Helios or Mercury, consult +[ares-os.org](https://ares-os.org) for documentation -- though beware of the +many stub pages. If you have any questions or want to get involved in writing +some drivers yourself, jump into our IRC channel: #helios on Libera Chat.