logo

drewdevault.com

[mirror] blog and personal website of Drew DeVault git clone https://hacktivis.me/git/mirror/drewdevault.com.git

Kernel-hacking-with-Hare-part-1.md (11746B)


  1. ---
  2. title: Notes from kernel hacking in Hare, part 1
  3. date: 2022-09-07
  4. ---
  5. One of the goals for the [Hare][0] programming language is to be able to write
  6. kernels, such as my [Helios][1] project. Kernels are complex beasts which exist
  7. in a somewhat unique problem space and have constraints that many userspace
  8. programs are not accustomed to. To illustrate this, I'm going to highlight a
  9. scenario where Hare's low-level types and manual memory management approach
  10. shines to enable a difficult use-case.
  11. [0]: https://harelang.org/
  12. [1]: https://git.sr.ht/~sircmpwn/helios
  13. Helios is a micro-kernel. During system initialization, its job is to load the
  14. initial task into memory, prepare the initial set of kernel objects for its use,
  15. provide it with information about the system, then jump to userspace and fuck
  16. off until someone needs it again. I'm going to focus on the "providing
  17. information" step here.
  18. The information the kernel needs to provide includes details about the
  19. capabilities that init has access to (such as working with I/O ports),
  20. information about system memory, the address of the framebuffer, and so on. This
  21. information is provided to init in the bootinfo structure, which is mapped into
  22. its address space, and passed to init via a register which points to this
  23. structure.[^1]
  24. [^1]: %rdi, if you were curious. Helios uses the System-V ABI, where %rdi is
  25. used as the first parameter to a function call. This isn't exactly a function
  26. call but the precedent is useful.
  27. ```hare
  28. // The bootinfo structure.
  29. export type bootinfo = struct {
  30. argv: str,
  31. // Capability ranges
  32. memory: cap_range,
  33. devmem: cap_range,
  34. userimage: cap_range,
  35. stack: cap_range,
  36. bootinfo: cap_range,
  37. unused: cap_range,
  38. // Other details
  39. arch: *arch_bootinfo,
  40. ipcbuf: uintptr,
  41. modules: []module_desc,
  42. memory_info: []memory_desc,
  43. devmem_info: []memory_desc,
  44. tls_base: uintptr,
  45. tls_size: size,
  46. };
  47. ```
  48. Parts of this structure are static (such as the capability number ranges for
  49. each capability assigned to init), and others are dynamic - such as structures
  50. describing the memory layout (N items where N is the number of memory regions),
  51. or the kernel command line. But, we're in a kernel -- dynamically allocating
  52. data is not so straightforward, especially for units smaller than a page\![^2]
  53. Moreover, the data structures allocated here need to be visible to userspace,
  54. and kernel memory is typically not available to userspace. A further
  55. complication is the three different address spaces we're working with here: a
  56. bootinfo object has a physical memory address, a kernel address, and a userspace
  57. address — three addresses to refer to a single object in different
  58. contexts.
  59. [^2]: 4096 bytes.
  60. Here's an example of what the code shown in this article is going to produce:
  61. ![A 64 by 64 grid of cells representing a page of physical memory. The first set
  62. of cells are colored blue; the next set green; then purple; the remainder are
  63. brown.](https://l.sr.ht/FpGq.png)
  64. This is a single page of physical memory which has been allocated for the
  65. bootinfo data, where each cell is a byte. The bootinfo structure itself comes
  66. first, in blue. Following this is an arch-specific bootinfo structure, in green:
  67. ```hare
  68. // x86_64-specific boot information
  69. export type arch_bootinfo = struct {
  70. // Page table capabilities
  71. pdpt: cap_range,
  72. pd: cap_range,
  73. pt: cap_range,
  74. // vbe_mode_info physical address from multiboot (or zero)
  75. vbe_mode_info: uintptr,
  76. };
  77. ```
  78. After this, in purple, is the kernel command line. These three structures are
  79. always consistently allocated for any boot configuration, so the code which
  80. sets up the bootinfo page (the code we're going to read now) always provisions
  81. them. Following these three items is a large area of free space (indicated in
  82. brown) which will be used to populate further dynamically allocated bootinfo
  83. structures, such as descriptions of physical memory regions.
  84. The code to set this up is `bootinfo_init`, which is responsible for allocating
  85. a suitable page, filling in the bootinfo structure, and preparing a vector to
  86. dynamically allocate additional data on this page. It also sets up the arch
  87. bootinfo and argv, so the page looks like this diagram when the function
  88. returns. And here it is, in its full glory:
  89. ```hare
  90. // Initializes the bootinfo context.
  91. export fn bootinfo_init(heap: *heap, argv: str) bootinfo_ctx = {
  92. let cslot = caps::cslot { ... };
  93. let page = objects::init(ctype::PAGE, &cslot, &heap.memory)!;
  94. let phys = objects::page_phys(page);
  95. let info = mem::phys_tokernel(phys): *bootinfo;
  96. const bisz = size(bootinfo);
  97. let bootvec = (info: *[*]u8)[bisz..arch::PAGESIZE][..0];
  98. let ctx = bootinfo_ctx {
  99. page = cslot,
  100. info = info,
  101. arch = null: *arch_bootinfo, // Fixed up below
  102. bootvec = bootvec,
  103. };
  104. let (vec, user) = mkbootvec(&ctx, size(arch_bootinfo), size(uintptr));
  105. ctx.arch = vec: *[*]u8: *arch_bootinfo;
  106. info.arch = user: *arch_bootinfo;
  107. let (vec, user) = mkbootvec(&ctx, len(argv), 1);
  108. vec[..] = strings::toutf8(argv)[..];
  109. info.argv = *(&types::string {
  110. data = user: *[*]u8,
  111. length = len(argv),
  112. capacity = len(argv),
  113. }: *str);
  114. return ctx;
  115. };
  116. ```
  117. The first three lines are fairly straightforward. Helios uses capability-based
  118. security, similar in design to [seL4][seL4]. All kernel objects — such as
  119. pages of physical memory — are utilized through the capability system. The
  120. first two lines set aside a slot to store the page capability in, then allocate
  121. a page using that slot. The next two lines grab the page's physical address and
  122. use `mem::phys_tokernel` to convert it to an address in the kernel's virtual
  123. address space, so that the kernel can write data to this page.
  124. [seL4]: https://sel4.systems/
  125. The next two lines are where it starts to get a little bit interesting:
  126. ```hare
  127. const bisz = size(bootinfo);
  128. let bootvec = (info: *[*]u8)[bisz..arch::PAGESIZE][..0];
  129. ```
  130. This casts the "info" variable (of type \*bootinfo) to a pointer to an
  131. *unbounded* array of bytes (\*\[\*\]u8). This is a little bit dangerous! Hare's
  132. arrays are bounds tested by default and using an unbounded type disables this
  133. safety feature. We want to get a bounded slice again soon, which is what the
  134. first slicing operator here does: `[bisz..arch::PAGESIZE]`. This obtains a
  135. *bounded* slice of bytes which starts from the end of the bootinfo structure and
  136. continues to the end of the page.
  137. The last expression, another slicing expression, is a little bit unusual. A
  138. slice type in Hare has the following internal representation:
  139. ```hare
  140. type slice = struct {
  141. data: nullable *void,
  142. length: size,
  143. capacity: size,
  144. };
  145. ```
  146. When you slice an unbounded array, you get a slice whose length and capacity
  147. fields are equal to the length of the slicing operation, in this case
  148. `arch::PAGESIZE - bisz`. But when you slice a *bounded* slice, the length field
  149. takes on the length of the slicing expression but the capacity field is
  150. calculated from the original slice. So by slicing our new bounded slice to the
  151. 0th index (\[..0\]), we obtain the following slice:
  152. ```hare
  153. slice {
  154. data = &(info: *[*]bootinfo)[1]: *[*]u8,
  155. length = 0,
  156. capacity = arch::PAGESIZE - bisz,
  157. };
  158. ```
  159. In plain English, this is a slice whose base address is the address following
  160. the bootinfo structure and whose capacity is the remainder of the free space on
  161. its page, with a length of zero. This is something we can use <span
  162. class="rainbow">static append</span> with\![^3]
  163. [^3]: Thanks to [Rahul of W3Bits](https://w3bits.com/rainbow-text/) for this CSS.
  164. <style>
  165. .rainbow {
  166. background-image: linear-gradient(to left, violet, indigo, blue, green, yellow, orange, red);
  167. background-clip: text;
  168. background-size: 800% 800%;
  169. animation: rainbow 8s ease infinite;
  170. -webkit-text-fill-color: transparent;
  171. }
  172. @keyframes rainbow {
  173. 0%{background-position:0% 50%}
  174. 50%{background-position:100% 25%}
  175. 100%{background-position:0% 50%}
  176. }
  177. </style>
  178. ```hare
  179. // Allocates a buffer in the bootinfo vector, returning the kernel vector and a
  180. // pointer to the structure in the init vspace.
  181. fn mkbootvec(info: *bootinfo_ctx, sz: size, al: size) ([]u8, uintptr) = {
  182. const prevlen = len(info.bootvec);
  183. let padding = 0z;
  184. if (prevlen % al != 0) {
  185. padding = al - prevlen % al;
  186. };
  187. static append(info.bootvec, [0...], sz + padding);
  188. const vec = info.bootvec[prevlen + padding..];
  189. return (vec, INIT_BOOTINFO_ADDR + size(bootinfo): uintptr prevlen: uintptr);
  190. };
  191. ```
  192. In Hare, slices can be dynamically grown and shrunk using the *append*,
  193. *insert*, and *delete* keywords. This is pretty useful, but not applicable for
  194. our kernel &mdash; remember, no dynamic memory allocation. Attempting to use
  195. append in Helios would cause a linking error because the necessary runtime code
  196. is absent from the kernel's Hare runtime. However, you can also *statically*
  197. append to a slice, as shown here. So long as the slice has a sufficient capacity
  198. to store the appended data, a static append or insert will succeed. If not, an
  199. assertion is thrown at runtime, much like a normal bounds test.
  200. This function makes good use of it to dynamically allocate memory from the
  201. bootinfo page. Given a desired size and alignment, it statically appends a
  202. suitable number of zeroes to the page, takes a slice of the new data, and
  203. returns both that slice (in the kernel's address space) and the address that
  204. data will have in the user address space. If we return to the earlier function,
  205. we can see how this is used to allocate space for the arch\_bootinfo structure:
  206. ```hare
  207. let (vec, user) = mkbootvec(&ctx, size(arch_bootinfo), size(uintptr));
  208. ctx.arch = vec: *[*]u8: *arch_bootinfo;
  209. info.arch = user: *arch_bootinfo;
  210. ```
  211. The "ctx" variable is used by the kernel to keep track of its state while
  212. setting up the init task, and we stash the kernel's pointer to this data
  213. structure in there, and the user's pointer in the bootinfo structure itself.
  214. This is also used to place argv into the bootinfo page:
  215. ```hare
  216. let (vec, user) = mkbootvec(&ctx, len(argv), 1);
  217. vec[..] = strings::toutf8(argv)[..];
  218. info.argv = *(&types::string {
  219. data = user: *[*]u8,
  220. length = len(argv),
  221. capacity = len(argv),
  222. }: *str);
  223. ```
  224. Here we allocate a vector whose length is the length of the argument string,
  225. with an alignment of one, and then copy argv into it as a UTF-8 slice. Slice
  226. copy expressions like this one are a type-safe and memory-safe way to memcpy in
  227. Hare. Then we do something a bit more interesting.
  228. Like slices, strings have an internal representation in Hare which includes a
  229. data pointer, length, and capacity. The types module provides a struct with this
  230. representation so that you can do low-level string manipulation in Hare should
  231. the task call for it. Hare's syntax allows us to take the address of a literal
  232. value, such as a types::string struct, using the & operator. Then we cast it to
  233. a pointer to a string and dereference it. Ta-da! We set the bootinfo argv field
  234. to a str value which uses the user address of the argument vector.
  235. Some use-cases call for this level of fine control over the precise behavior of
  236. your program. Hare's goal is to accommodate this need with little fanfare. Here
  237. we've drawn well outside of the lines of Hare's safety features, but sometimes
  238. it's useful and necessary to do so. And Hare provides us with the tools to get
  239. the safety harness back on quickly, such as we saw with the construction of the
  240. bootvec slice. This code is pretty weird but to an experienced Hare programmer
  241. (which, I must admit, the world has very few of) it should make sense.
  242. I hope you found this interesting! I'm going back to kernel hacking. Next up is
  243. loading the userspace ELF image into its address space. I had this working
  244. before but decided to rewrite it. Wish me good luck!