Kernel-hacking-with-Hare-part-1.md (11746B)
- ---
- title: Notes from kernel hacking in Hare, part 1
- date: 2022-09-07
- ---
- One of the goals for the [Hare][0] programming language is to be able to write
- kernels, such as my [Helios][1] project. Kernels are complex beasts which exist
- in a somewhat unique problem space and have constraints that many userspace
- programs are not accustomed to. To illustrate this, I'm going to highlight a
- scenario where Hare's low-level types and manual memory management approach
- shines to enable a difficult use-case.
- [0]: https://harelang.org/
- [1]: https://git.sr.ht/~sircmpwn/helios
- Helios is a micro-kernel. During system initialization, its job is to load the
- initial task into memory, prepare the initial set of kernel objects for its use,
- provide it with information about the system, then jump to userspace and fuck
- off until someone needs it again. I'm going to focus on the "providing
- information" step here.
- The information the kernel needs to provide includes details about the
- capabilities that init has access to (such as working with I/O ports),
- information about system memory, the address of the framebuffer, and so on. This
- information is provided to init in the bootinfo structure, which is mapped into
- its address space, and passed to init via a register which points to this
- structure.[^1]
- [^1]: %rdi, if you were curious. Helios uses the System-V ABI, where %rdi is
- used as the first parameter to a function call. This isn't exactly a function
- call but the precedent is useful.
- ```hare
- // The bootinfo structure.
- export type bootinfo = struct {
- argv: str,
- // Capability ranges
- memory: cap_range,
- devmem: cap_range,
- userimage: cap_range,
- stack: cap_range,
- bootinfo: cap_range,
- unused: cap_range,
- // Other details
- arch: *arch_bootinfo,
- ipcbuf: uintptr,
- modules: []module_desc,
- memory_info: []memory_desc,
- devmem_info: []memory_desc,
- tls_base: uintptr,
- tls_size: size,
- };
- ```
- Parts of this structure are static (such as the capability number ranges for
- each capability assigned to init), and others are dynamic - such as structures
- describing the memory layout (N items where N is the number of memory regions),
- or the kernel command line. But, we're in a kernel -- dynamically allocating
- data is not so straightforward, especially for units smaller than a page\![^2]
- Moreover, the data structures allocated here need to be visible to userspace,
- and kernel memory is typically not available to userspace. A further
- complication is the three different address spaces we're working with here: a
- bootinfo object has a physical memory address, a kernel address, and a userspace
- address — three addresses to refer to a single object in different
- contexts.
- [^2]: 4096 bytes.
- Here's an example of what the code shown in this article is going to produce:
- ![A 64 by 64 grid of cells representing a page of physical memory. The first set
- of cells are colored blue; the next set green; then purple; the remainder are
- brown.](https://l.sr.ht/FpGq.png)
- This is a single page of physical memory which has been allocated for the
- bootinfo data, where each cell is a byte. The bootinfo structure itself comes
- first, in blue. Following this is an arch-specific bootinfo structure, in green:
- ```hare
- // x86_64-specific boot information
- export type arch_bootinfo = struct {
- // Page table capabilities
- pdpt: cap_range,
- pd: cap_range,
- pt: cap_range,
- // vbe_mode_info physical address from multiboot (or zero)
- vbe_mode_info: uintptr,
- };
- ```
- After this, in purple, is the kernel command line. These three structures are
- always consistently allocated for any boot configuration, so the code which
- sets up the bootinfo page (the code we're going to read now) always provisions
- them. Following these three items is a large area of free space (indicated in
- brown) which will be used to populate further dynamically allocated bootinfo
- structures, such as descriptions of physical memory regions.
- The code to set this up is `bootinfo_init`, which is responsible for allocating
- a suitable page, filling in the bootinfo structure, and preparing a vector to
- dynamically allocate additional data on this page. It also sets up the arch
- bootinfo and argv, so the page looks like this diagram when the function
- returns. And here it is, in its full glory:
- ```hare
- // Initializes the bootinfo context.
- export fn bootinfo_init(heap: *heap, argv: str) bootinfo_ctx = {
- let cslot = caps::cslot { ... };
- let page = objects::init(ctype::PAGE, &cslot, &heap.memory)!;
- let phys = objects::page_phys(page);
- let info = mem::phys_tokernel(phys): *bootinfo;
- const bisz = size(bootinfo);
- let bootvec = (info: *[*]u8)[bisz..arch::PAGESIZE][..0];
- let ctx = bootinfo_ctx {
- page = cslot,
- info = info,
- arch = null: *arch_bootinfo, // Fixed up below
- bootvec = bootvec,
- };
- let (vec, user) = mkbootvec(&ctx, size(arch_bootinfo), size(uintptr));
- ctx.arch = vec: *[*]u8: *arch_bootinfo;
- info.arch = user: *arch_bootinfo;
- let (vec, user) = mkbootvec(&ctx, len(argv), 1);
- vec[..] = strings::toutf8(argv)[..];
- info.argv = *(&types::string {
- data = user: *[*]u8,
- length = len(argv),
- capacity = len(argv),
- }: *str);
- return ctx;
- };
- ```
- The first three lines are fairly straightforward. Helios uses capability-based
- security, similar in design to [seL4][seL4]. All kernel objects — such as
- pages of physical memory — are utilized through the capability system. The
- first two lines set aside a slot to store the page capability in, then allocate
- a page using that slot. The next two lines grab the page's physical address and
- use `mem::phys_tokernel` to convert it to an address in the kernel's virtual
- address space, so that the kernel can write data to this page.
- [seL4]: https://sel4.systems/
- The next two lines are where it starts to get a little bit interesting:
- ```hare
- const bisz = size(bootinfo);
- let bootvec = (info: *[*]u8)[bisz..arch::PAGESIZE][..0];
- ```
- This casts the "info" variable (of type \*bootinfo) to a pointer to an
- *unbounded* array of bytes (\*\[\*\]u8). This is a little bit dangerous! Hare's
- arrays are bounds tested by default and using an unbounded type disables this
- safety feature. We want to get a bounded slice again soon, which is what the
- first slicing operator here does: `[bisz..arch::PAGESIZE]`. This obtains a
- *bounded* slice of bytes which starts from the end of the bootinfo structure and
- continues to the end of the page.
- The last expression, another slicing expression, is a little bit unusual. A
- slice type in Hare has the following internal representation:
- ```hare
- type slice = struct {
- data: nullable *void,
- length: size,
- capacity: size,
- };
- ```
- When you slice an unbounded array, you get a slice whose length and capacity
- fields are equal to the length of the slicing operation, in this case
- `arch::PAGESIZE - bisz`. But when you slice a *bounded* slice, the length field
- takes on the length of the slicing expression but the capacity field is
- calculated from the original slice. So by slicing our new bounded slice to the
- 0th index (\[..0\]), we obtain the following slice:
- ```hare
- slice {
- data = &(info: *[*]bootinfo)[1]: *[*]u8,
- length = 0,
- capacity = arch::PAGESIZE - bisz,
- };
- ```
- In plain English, this is a slice whose base address is the address following
- the bootinfo structure and whose capacity is the remainder of the free space on
- its page, with a length of zero. This is something we can use <span
- class="rainbow">static append</span> with\![^3]
- [^3]: Thanks to [Rahul of W3Bits](https://w3bits.com/rainbow-text/) for this CSS.
- <style>
- .rainbow {
- background-image: linear-gradient(to left, violet, indigo, blue, green, yellow, orange, red);
- background-clip: text;
- background-size: 800% 800%;
- animation: rainbow 8s ease infinite;
- -webkit-text-fill-color: transparent;
- }
- @keyframes rainbow {
- 0%{background-position:0% 50%}
- 50%{background-position:100% 25%}
- 100%{background-position:0% 50%}
- }
- </style>
- ```hare
- // Allocates a buffer in the bootinfo vector, returning the kernel vector and a
- // pointer to the structure in the init vspace.
- fn mkbootvec(info: *bootinfo_ctx, sz: size, al: size) ([]u8, uintptr) = {
- const prevlen = len(info.bootvec);
- let padding = 0z;
- if (prevlen % al != 0) {
- padding = al - prevlen % al;
- };
- static append(info.bootvec, [0...], sz + padding);
- const vec = info.bootvec[prevlen + padding..];
- return (vec, INIT_BOOTINFO_ADDR + size(bootinfo): uintptr prevlen: uintptr);
- };
- ```
- In Hare, slices can be dynamically grown and shrunk using the *append*,
- *insert*, and *delete* keywords. This is pretty useful, but not applicable for
- our kernel — remember, no dynamic memory allocation. Attempting to use
- append in Helios would cause a linking error because the necessary runtime code
- is absent from the kernel's Hare runtime. However, you can also *statically*
- append to a slice, as shown here. So long as the slice has a sufficient capacity
- to store the appended data, a static append or insert will succeed. If not, an
- assertion is thrown at runtime, much like a normal bounds test.
- This function makes good use of it to dynamically allocate memory from the
- bootinfo page. Given a desired size and alignment, it statically appends a
- suitable number of zeroes to the page, takes a slice of the new data, and
- returns both that slice (in the kernel's address space) and the address that
- data will have in the user address space. If we return to the earlier function,
- we can see how this is used to allocate space for the arch\_bootinfo structure:
- ```hare
- let (vec, user) = mkbootvec(&ctx, size(arch_bootinfo), size(uintptr));
- ctx.arch = vec: *[*]u8: *arch_bootinfo;
- info.arch = user: *arch_bootinfo;
- ```
- The "ctx" variable is used by the kernel to keep track of its state while
- setting up the init task, and we stash the kernel's pointer to this data
- structure in there, and the user's pointer in the bootinfo structure itself.
- This is also used to place argv into the bootinfo page:
- ```hare
- let (vec, user) = mkbootvec(&ctx, len(argv), 1);
- vec[..] = strings::toutf8(argv)[..];
- info.argv = *(&types::string {
- data = user: *[*]u8,
- length = len(argv),
- capacity = len(argv),
- }: *str);
- ```
- Here we allocate a vector whose length is the length of the argument string,
- with an alignment of one, and then copy argv into it as a UTF-8 slice. Slice
- copy expressions like this one are a type-safe and memory-safe way to memcpy in
- Hare. Then we do something a bit more interesting.
- Like slices, strings have an internal representation in Hare which includes a
- data pointer, length, and capacity. The types module provides a struct with this
- representation so that you can do low-level string manipulation in Hare should
- the task call for it. Hare's syntax allows us to take the address of a literal
- value, such as a types::string struct, using the & operator. Then we cast it to
- a pointer to a string and dereference it. Ta-da! We set the bootinfo argv field
- to a str value which uses the user address of the argument vector.
- Some use-cases call for this level of fine control over the precise behavior of
- your program. Hare's goal is to accommodate this need with little fanfare. Here
- we've drawn well outside of the lines of Hare's safety features, but sometimes
- it's useful and necessary to do so. And Hare provides us with the tools to get
- the safety harness back on quickly, such as we saw with the construction of the
- bootvec slice. This code is pretty weird but to an experienced Hare programmer
- (which, I must admit, the world has very few of) it should make sense.
- I hope you found this interesting! I'm going back to kernel hacking. Next up is
- loading the userspace ELF image into its address space. I had this working
- before but decided to rewrite it. Wish me good luck!