logo

drewdevault.com

[mirror] blog and personal website of Drew DeVault git clone https://hacktivis.me/git/mirror/drewdevault.com.git
commit: 5b3d635865b12206a112946d2f2c1a79a77be6d8
parent 0dafcf4c5c22faedf7f70d238aef302e4bcb653b
Author: Drew DeVault <sir@cmpwn.com>
Date:   Mon, 20 Feb 2023 15:16:52 +0100

Helios aarch64 part 1

Diffstat:

Acontent/blog/2023-02-20-Helios-aarch64.md682+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 682 insertions(+), 0 deletions(-)

diff --git a/content/blog/2023-02-20-Helios-aarch64.md b/content/blog/2023-02-20-Helios-aarch64.md @@ -0,0 +1,682 @@ +--- +title: Porting Helios to aarch64 for my FOSDEM talk, part one +date: 2023-02-20 +--- + +[Helios] is a microkernel written in the [Hare] programming language, and the +subject of a talk I did at FOSDEM earlier this month. You can watch the talk +here if you like: + +[Helios]: https://sr.ht/~sircmpwn/helios +[Hare]: https://harelang.org + +<iframe title="FOSDEM 2023: Introducing the Helios microkernel" src="https://spacepub.space/videos/embed/f6435a6c-34e0-4602-ad5d-f791643111ab" allowfullscreen="" sandbox="allow-same-origin allow-scripts allow-popups" width="560" height="315" frameborder="0"></iframe> + +A while ago I promised someone that I would not do any talks on Helios until I +could present them from Helios itself, and at FOSDEM I made good on that +promise: my talk was presented from a Raspberry Pi 4 running Helios. The kernel +was originally designed for x86\_64 (though we were careful to avoid painting +ourselves into any corners so that we could port it to more architectures later +on), and I initially planned to write an Intel HD Graphics driver so that I +could drive the projector from my laptop. But, after a few days spent trying to +comprehend the IHD manuals, I decided it would be *much* easier to port the +entire system to aarch64 and write a driver for the much-simpler RPi GPU +instead. 42 days later the port was complete, and a week or so after that I +successfully presented the talk at FOSDEM. In a series of blog posts, I will +take a look at those 42 days of work and explain how the aarch64 port works. +Today's post focuses on the bootloader. + +The Helios boot-up process is: + +1. Bootloader starts up and loads the kernel, then jumps to it +2. The kernel configures the system and loads the init process +3. Kernel provides runtime services to init (and any subsequent processes) + +In theory, the port to aarch64 would address these steps in order, but in +practice step (2) relies heavily on the runtime services provided by step (3), +so much of the work was ordered 1, 3, 2. This blog post focuses on part 1, I'll +cover parts 2 and 3 and all of the fun problems they caused in later posts. + +In any case, the bootloader was the first step. Some basic changes to the build +system established boot/+aarch64 as the aarch64 bootloader, and a simple +qemu-specific ARM kernel was prepared which just gave a little "hello world" to +demonstrate the multi-arch build system was working as intended. More build +system refinements would come later, but it's off to the races from here. +Targeting qemu's aarch64 virt platform was useful for most of the initial +debugging and bring-up (and is generally useful at all times, as a much easier +platform to debug than real hardware); the first tests on real hardware came +much later. + +Booting up is a sore point on most systems. It involves a lot of arch-specific +procedures, but also generally calls for custom binary formats and annoying +things like disk drivers &mdash; which don't belong in a microkernel. So the +Helios bootloaders are separated from the kernel proper, which is a simple ELF +executable. The bootloader loads this ELF file into memory, configures a few +simple things, then passes some information along to the kernel entry point. The +bootloader's memory and other resources are hereafter abandoned and are later +reclaimed for general use. + +On aarch64 the boot story is pretty abysmal, and I wanted to avoid adding the +SoC-specific complexity which is endemic to the platform. Thus, two solutions +are called for: [EFI] and [device trees]. At the bootloader level, EFI is the +more important concern. For qemu-virt and Raspberry Pi, [edk2] is the +free-software implementation of choice when it comes to EFI. The first order of +business is producing an executable which can be loaded by EFI, which is, rather +unfortunately, based on the Windows [COFF/PE32+] format. I took inspiration from +Linux and made an disgusting EFI stub solution, which involves hand-writing a +PE32+ header in assembly and doing some truly horrifying things with binutils to +massage everything into order. Much of the header is lifted from Linux: + +[EFI]: https://uefi.org/specifications +[device trees]: https://www.devicetree.org/specifications/ +[edk2]: https://github.com/tianocore/edk2 +[COFF/PE32+]: https://learn.microsoft.com/en-us/windows/win32/debug/pe-format + +``` +.section .text.head +.global base +base: +.L_head: + /* DOS header */ + .ascii "MZ" + .skip 58 + .short .Lpe_header - .L_head + .align 4 +.Lpe_header: + .ascii "PE\0\0" + .short 0xAA64 /* Machine = AARCH64 */ + .short 2 /* NumberOfSections */ + .long 0 /* TimeDateStamp */ + .long 0 /* PointerToSymbolTable */ + .long 0 /* NumberOfSymbols */ + .short .Lsection_table - .Loptional_header /* SizeOfOptionalHeader */ + /* Characteristics: + * IMAGE_FILE_EXECUTABLE_IMAGE | + * IMAGE_FILE_LINE_NUMS_STRIPPED | + * IMAGE_FILE_DEBUG_STRIPPED */ + .short 0x206 +.Loptional_header: + .short 0x20b /* Magic = PE32+ (64-bit) */ + .byte 0x02 /* MajorLinkerVersion */ + .byte 0x14 /* MinorLinkerVersion */ + .long _data - .Lefi_header_end /* SizeOfCode */ + .long __pecoff_data_size /* SizeOfInitializedData */ + .long 0 /* SizeOfUninitializedData */ + .long _start - .L_head /* AddressOfEntryPoint */ + .long .Lefi_header_end - .L_head /* BaseOfCode */ +.Lextra_header: + .quad 0 /* ImageBase */ + .long 4096 /* SectionAlignment */ + .long 512 /* FileAlignment */ + .short 0 /* MajorOperatingSystemVersion */ + .short 0 /* MinorOperatingSystemVersion */ + .short 0 /* MajorImageVersion */ + .short 0 /* MinorImageVersion */ + .short 0 /* MajorSubsystemVersion */ + .short 0 /* MinorSubsystemVersion */ + .long 0 /* Reserved */ + + .long _end - .L_head /* SizeOfImage */ + + .long .Lefi_header_end - .L_head /* SizeOfHeaders */ + .long 0 /* CheckSum */ + .short 10 /* Subsystem = EFI application */ + .short 0 /* DLLCharacteristics */ + .quad 0 /* SizeOfStackReserve */ + .quad 0 /* SizeOfStackCommit */ + .quad 0 /* SizeOfHeapReserve */ + .quad 0 /* SizeOfHeapCommit */ + .long 0 /* LoaderFlags */ + .long 6 /* NumberOfRvaAndSizes */ + + .quad 0 /* Export table */ + .quad 0 /* Import table */ + .quad 0 /* Resource table */ + .quad 0 /* Exception table */ + .quad 0 /* Certificate table */ + .quad 0 /* Base relocation table */ + +.Lsection_table: + .ascii ".text\0\0\0" /* Name */ + .long _etext - .Lefi_header_end /* VirtualSize */ + .long .Lefi_header_end - .L_head /* VirtualAddress */ + .long _etext - .Lefi_header_end /* SizeOfRawData */ + .long .Lefi_header_end - .L_head /* PointerToRawData */ + .long 0 /* PointerToRelocations */ + .long 0 /* PointerToLinenumbers */ + .short 0 /* NumberOfRelocations */ + .short 0 /* NumberOfLinenumbers */ + /* IMAGE_SCN_CNT_CODE | IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_EXECUTE */ + .long 0x60000020 + + .ascii ".data\0\0\0" /* Name */ + .long __pecoff_data_size /* VirtualSize */ + .long _data - .L_head /* VirtualAddress */ + .long __pecoff_data_rawsize /* SizeOfRawData */ + .long _data - .L_head /* PointerToRawData */ + .long 0 /* PointerToRelocations */ + .long 0 /* PointerToLinenumbers */ + .short 0 /* NumberOfRelocations */ + .short 0 /* NumberOfLinenumbers */ + /* IMAGE_SCN_CNT_INITIALIZED_DATA | IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_WRITE */ + .long 0xc0000040 + +.balign 0x10000 +.Lefi_header_end: + +.global _start +_start: + stp x0, x1, [sp, -16]! + + adrp x0, base + add x0, x0, #:lo12:base + adrp x1, _DYNAMIC + add x1, x1, #:lo12:_DYNAMIC + bl relocate + cmp w0, #0 + bne 0f + + ldp x0, x1, [sp], 16 + + b bmain + +0: + /* relocation failed */ + add sp, sp, -16 + ret +``` + +The specific details about how any of this works are complex and unpleasant, +I'll refer you to the spec if you're curious, and offer a general suggestion +that cargo-culting my work here would be a lot easier than understanding it +should you need to build something similar.[^1] + +[^1]: A cursory review of this code while writing this blog post draws my + attention to a few things that ought to be improved as well. + +Note the entry point for later; we store two arguments from EFI (x0 and x1) on +the stack and eventually branch to bmain. + +This file is assisted by the linker script: + +``` +ENTRY(_start) +OUTPUT_FORMAT(elf64-littleaarch64) + +SECTIONS { + /DISCARD/ : { + *(.rel.reloc) + *(.eh_frame) + *(.note.GNU-stack) + *(.interp) + *(.dynsym .dynstr .hash .gnu.hash) + } + + . = 0xffff800000000000; + + .text.head : { + _head = .; + KEEP(*(.text.head)) + } + + .text : ALIGN(64K) { + _text = .; + KEEP(*(.text)) + *(.text.*) + . = ALIGN(16); + *(.got) + } + + . = ALIGN(64K); + _etext = .; + + .dynamic : { + *(.dynamic) + } + + .data : ALIGN(64K) { + _data = .; + KEEP(*(.data)) + *(.data.*) + + /* Reserve page tables */ + . = ALIGN(4K); + L0 = .; + . += 512 * 8; + L1_ident = .; + . += 512 * 8; + L1_devident = .; + . += 512 * 8; + L1_kernel = .; + . += 512 * 8; + L2_kernel = .; + . += 512 * 8; + L3_kernel = .; + . += 512 * 8; + } + + .rela.text : { + *(.rela.text) + *(.rela.text*) + } + .rela.dyn : { + *(.rela.dyn) + } + .rela.plt : { + *(.rela.plt) + } + .rela.got : { + *(.rela.got) + } + .rela.data : { + *(.rela.data) + *(.rela.data*) + } + + .pecoff_edata_padding : { + BYTE(0); + . = ALIGN(512); + } + __pecoff_data_rawsize = ABSOLUTE(. - _data); + _edata = .; + + .bss : ALIGN(4K) { + KEEP(*(.bss)) + *(.bss.*) + *(.dynbss) + } + + . = ALIGN(64K); + __pecoff_data_size = ABSOLUTE(. - _data); + _end = .; +} +``` + +Items of note here are the careful treatment of relocation sections +(cargo-culted from earlier work on RISC-V with Hare; not actually necessary as +qbe generates PIC for aarch64)[^2] and the extra symbols used to gather +information for the PE32+ header. Padding is also added in the required places, +and static aarch64 page tables are defined for later use. + +[^2]: PIC stands for "position independent code". EFI can load executables at + any location in memory and the code needs to be prepared to deal with that; + PIC is the tool we use for this purpose. + +This is built as a shared object, and the Makefile ~~mutilates~~ reformats the +resulting ELF file to produce a PE32+ executable: + +``` +$(BOOT)/bootaa64.so: $(BOOT_OBJS) $(BOOT)/link.ld + $(LD) -Bsymbolic -shared --no-undefined \ + -T $(BOOT)/link.ld \ + $(BOOT_OBJS) \ + -o $@ + +$(BOOT)/bootaa64.efi: $(BOOT)/bootaa64.so + $(OBJCOPY) -Obinary \ + -j .text.head -j .text -j .dynamic -j .data \ + -j .pecoff_edata_padding \ + -j .dynstr -j .dynsym \ + -j .rel -j .rel.* -j .rel* \ + -j .rela -j .rela.* -j .rela* \ + $< $@ +``` + +With all of this mess sorted, and the PE32+ entry point branching to bmain, we +can finally enter some Hare code: + +``` +export fn bmain( + image_handle: efi::HANDLE, + systab: *efi::SYSTEM_TABLE, +) efi::STATUS = { + // ... +}; +``` + +Getting just this far took 3 full days of work. + +Initially, the Hare code incorporated a lot of proof-of-concept work from Alexey +Yerin's "carrot" kernel prototype for RISC-V, which also booted via EFI. +Following the early bringing-up of the bootloader environment, this was +refactored into a more robust and general-purpose EFI support layer for Helios, +which will be applicable to future ports. You can review the EFI support +module's haredocs [here](https://mirror.drewdevault.com/efi.html). The purpose +of this module is to provide an idiomatic Hare-oriented interface to the EFI +boot services, which the bootloader makes use of mainly to read files from the +boot media and examine the system's memory map. + +Let's take a look at the first few lines of bmain: + +``` +efi::init(image_handle, systab)!; + +const eficons = eficons_init(systab); +log::setcons(&eficons); +log::printfln("Booting Helios aarch64 via EFI"); + +if (readel() == el::EL3) { + log::printfln("Booting from EL3 is not supported"); + return efi::STATUS::LOAD_ERROR; +}; + +let mem = allocator { ... }; +init_mmap(&mem); +init_pagetables(); +``` + +Significant build system overhauls were required such that Hare modules from +the kernel like log (and, later, other modules like elf) could be incorporated +into the bootloader, simplifying the process of implementing more complex +bootloaders. The first call of note here is init\_mmap, which scans the EFI +memory map and prepares a simple high-watermark allocator to be used by the +bootloader to allocate memory for the kernel image and other items of interest. +It's quite simple, it just finds the largest area of general-purpose memory and +sets up an allocator with it: + +``` +// Loads the memory map from EFI and initializes a page allocator using the +// largest area of physical memory. +fn init_mmap(mem: *allocator) void = { + const iter = efi::iter_mmap()!; + let maxphys: uintptr = 0, maxpages = 0u64; + for (true) { + const desc = match (efi::mmap_next(&iter)) { + case let desc: *efi::MEMORY_DESCRIPTOR => + yield desc; + case void => + break; + }; + if (desc.DescriptorType != efi::MEMORY_TYPE::CONVENTIONAL) { + continue; + }; + if (desc.NumberOfPages > maxpages) { + maxphys = desc.PhysicalStart; + maxpages = desc.NumberOfPages; + }; + }; + assert(maxphys != 0, "No suitable memory area found for kernel loader"); + assert(maxpages <= types::UINT_MAX); + pagealloc_init(mem, maxphys, maxpages: uint); +}; +``` + +init\_pagetables is next. This populates the page tables reserved by the linker +with the desired higher-half memory map, illustrated in the comments shown here: + +``` +fn init_pagetables() void = { + // 0xFFFF0000xxxxxxxx - 0xFFFF0200xxxxxxxx: identity map + // 0xFFFF0200xxxxxxxx - 0xFFFF0400xxxxxxxx: identity map (dev) + // 0xFFFF8000xxxxxxxx - 0xFFFF8000xxxxxxxx: kernel image + // + // L0[0x000] => L1_ident + // L0[0x004] => L1_devident + // L1_ident[*] => 1 GiB identity mappings + // L0[0x100] => L1_kernel + // L1_kernel[0] => L2_kernel + // L2_kernel[0] => L3_kernel + // L3_kernel[0] => 4 KiB kernel pages + L0[0x000] = PT_TABLE | &L1_ident: uintptr | PT_AF; + L0[0x004] = PT_TABLE | &L1_devident: uintptr | PT_AF; + L0[0x100] = PT_TABLE | &L1_kernel: uintptr | PT_AF; + L1_kernel[0] = PT_TABLE | &L2_kernel: uintptr | PT_AF; + L2_kernel[0] = PT_TABLE | &L3_kernel: uintptr | PT_AF; + for (let i = 0u64; i < len(L1_ident): u64; i += 1) { + L1_ident[i] = PT_BLOCK | (i * 0x40000000): uintptr | + PT_NORMAL | PT_AF | PT_ISM | PT_RW; + }; + for (let i = 0u64; i < len(L1_devident): u64; i += 1) { + L1_devident[i] = PT_BLOCK | (i * 0x40000000): uintptr | + PT_DEVICE | PT_AF | PT_ISM | PT_RW; + }; +}; +``` + +In short, we want three larger memory regions to be available: an identity map, +where physical memory addresses correlate 1:1 with virtual memory, an identity +map configured for device MMIO (e.g. with caching disabled), and an area to load +the kernel image. The first two are straightforward, they use uniform 1 GiB +mappings to populate their respective page tables. The latter is slightly more +complex, ultimately the kernel is loaded in 4 KiB pages so we need to set up +intermediate page tables for that purpose. + +We cannot actually enable these page tables until we're finished making use of +the EFI boot services &mdash; the EFI specification requires us to preserve the +online memory map at this stage of affairs. However, this does lay the +groundwork for the kernel loader: we have an allocator to provide pages of +memory, and page tables to set up virtual memory mappings that can be activated +once we're done with EFI. bmain thus proceeds with loading the kernel: + +``` +const kernel = match (efi::open("\\helios", efi::FILE_MODE::READ)) { +case let file: *efi::FILE_PROTOCOL => + yield file; +case let err: efi::error => + log::printfln("Error: no kernel found at /helios"); + return err: efi::STATUS; +}; + +log::printfln("Load kernel /helios"); +const kentry = match (load(&mem, kernel)) { +case let err: efi::error => + return err: efi::STATUS; +case let entry: uintptr => + yield entry: *kentry; +}; +efi::close(kernel)!; +``` + +The loader itself (the "load" function here) is a relatively straightforward ELF +loader; if you've seen one you've seen them all. Nevertheless, you may browse it +[online][0] if you so wish. The only item of note here is the function used for +mapping kernel pages: + +[0]: https://git.sr.ht/~sircmpwn/helios/tree/02d0490487c7a0fb4b0367b95819e808b98f87fb/item/boot/%2Baarch64/loader.ha + +``` +// Maps a physical page into the kernel's virtual address space. +fn kmmap(virt: uintptr, phys: uintptr, flags: uintptr) void = { + assert(virt & ~0x1ff000 == 0xffff800000000000: uintptr); + const offs = (virt >> 12) & 0x1ff; + L3_kernel[offs] = PT_PAGE | PT_NORMAL | PT_AF | PT_ISM | phys | flags; +}; +``` + +The assertion enforces a constraint which is implemented by our kernel linker +script, namely that all loadable kernel program headers are located within the +kernel's reserved address space. With this constraint in place, the +implementation is simpler than many mmap implementations; we can assume that +L3\_kernel is the correct page table and just load it up with the desired +physical address and mapping flags. + +Following the kernel loader, the bootloader addresses other items of interest, +such as loading the device tree and boot modules &mdash; which includes, for +instance, the init process image and an initramfs. It also allocates & populates +data structures with information which will be of later use to the kernel, +including the memory map. This code is relatively straightforward and not +particularly interesting; most of these processes takes advantage of the same +straightforward Hare function: + +``` +// Loads a file into continuous pages of memory and returns its physical +// address. +fn load_file( + mem: *allocator, + file: *efi::FILE_PROTOCOL, +) (uintptr | efi::error) = { + const info = efi::file_info(file)?; + const fsize = info.FileSize: size; + let npage = fsize / PAGESIZE; + if (fsize % PAGESIZE != 0) { + npage += 1; + }; + + let base: uintptr = 0; + for (let i = 0z; i < npage; i += 1) { + const phys = pagealloc(mem); + if (base == 0) { + base = phys; + }; + + const nbyte = if ((i + 1) * PAGESIZE > fsize) { + yield fsize % PAGESIZE; + } else { + yield PAGESIZE; + }; + let dest = (phys: *[*]u8)[..nbyte]; + const n = efi::read(file, dest)?; + assert(n == nbyte); + }; + + return base; +}; +``` + +It is not necessary to map these into virtual memory anywhere, the kernel later +uses the identity-mapped physical memory region in the higher half to read +them. Tasks of interest resume at the end of bmain: + +``` +efi::exit_boot_services(); +init_mmu(); +enter_kernel(kentry, ctx); +``` + +Once we exit boot services, we are free to configure the MMU according to our +desired specifications and make good use of all of the work done earlier to +prepare a kernel memory map. Thus, init\_mmu: + +``` +// Initializes the ARM MMU to our desired specifications. This should take place +// *after* EFI boot services have exited because we're going to mess up the MMU +// configuration that it depends on. +fn init_mmu() void = { + // Disable MMU + const sctlr_el1 = rdsctlr_el1(); + wrsctlr_el1(sctlr_el1 & ~SCTLR_EL1_M); + + // Configure MAIR + const mair: u64 = + (0xFF << 0) | // Attr0: Normal memory; IWBWA, OWBWA, NTR + (0x00 << 8); // Attr1: Device memory; nGnRnE, OSH + wrmair_el1(mair); + + const tsz: u64 = 64 - 48; + const ips = rdtcr_el1() & TCR_EL1_IPS_MASK; + const tcr_el1: u64 = + TCR_EL1_IPS_42B_4T | // 4 TiB IPS + TCR_EL1_TG1_4K | // Higher half: 4K granule size + TCR_EL1_SH1_IS | // Higher half: inner shareable + TCR_EL1_ORGN1_WB | // Higher half: outer write-back + TCR_EL1_IRGN1_WB | // Higher half: inner write-back + (tsz << TCR_EL1_T1SZ) | // Higher half: 48 bits + TCR_EL1_TG0_4K | // Lower half: 4K granule size + TCR_EL1_SH0_IS | // Lower half: inner sharable + TCR_EL1_ORGN0_WB | // Lower half: outer write-back + TCR_EL1_IRGN0_WB | // Lower half: inner write-back + (tsz << TCR_EL1_T0SZ); // Lower half: 48 bits + wrtcr_el1(tcr_el1); + + // Load page tables + wrttbr0_el1(&L0[0]: uintptr); + wrttbr1_el1(&L0[0]: uintptr); + invlall(); + + // Enable MMU + const sctlr_el1: u64 = + SCTLR_EL1_M | // Enable MMU + SCTLR_EL1_C | // Enable cache + SCTLR_EL1_I | // Enable instruction cache + SCTLR_EL1_SPAN | // SPAN? + SCTLR_EL1_NTLSMD | // NTLSMD? + SCTLR_EL1_LSMAOE | // LSMAOE? + SCTLR_EL1_TSCXT | // TSCXT? + SCTLR_EL1_ITD; // ITD? + wrsctlr_el1(sctlr_el1); +}; +``` + +There are a lot of bits here! Figuring out which ones to enable or disable was a +project in and of itself. One of the major challenges, funnily enough, was +finding the correct ARM manual to reference to understand all of these +registers. I'll save you some time and [link to it][1] directly, should you ever +find yourself writing similar code. Some question marks in comments towards the +end point out some flags that I'm still not sure about. The ARM CPU is *very* +configurable and identifying the configuration that produces the desired +behavior for a general-purpose kernel requires some effort. + +[1]: https://mirror.drewdevault.com/ARMARM.pdf + +After this function completes, the MMU is initialized and we are up and running +with the kernel memory map we prepared earlier; the kernel is loaded in the +higher half and the MMU is prepared to service it. So, we can jump to the kernel +via enter\_kernel: + +``` +@noreturn fn enter_kernel(entry: *kentry, ctx: *bootctx) void = { + const el = readel(); + switch (el) { + case el::EL0 => + abort("Bootloader running in EL0, breaks EFI invariant"); + case el::EL1 => + // Can boot immediately + entry(ctx); + case el::EL2 => + // Boot from EL2 => EL1 + // + // This is the bare minimum necessary to get to EL1. Future + // improvements might be called for here if anyone wants to + // implement hardware virtualization on aarch64. Good luck to + // this future hacker. + + // Enable EL1 access to the physical counter register + const cnt = rdcnthctl_el2(); + wrcnthctl_el2(cnt | 0b11); + + // Enable aarch64 in EL1 & SWIO, disable most other EL2 things + // Note: I bet someday I'll return to this line because of + // Problems + const hcr: u64 = (1 << 1) | (1 << 31); + wrhcr_el2(hcr); + + // Set up SPSR for EL1 + // XXX: Magic constant I have not bothered to understand + wrspsr_el2(0x3c4); + + enter_el1(ctx, entry); + case el::EL3 => + // Not supported, tested earlier on + abort("Unsupported boot configuration"); + }; +}; +``` + +Here we see the detritus from one of many battles I fought to port this kernel: +the EL2 => EL1 transition. aarch64 has several "exception levels", which are +semantically similar to the x86\_64 concept of protection rings. EL0 is used for +userspace code, which is not applicable under these circumstances; an assertion +sanity-checks this invariant. EL1 is the simplest case, this is used for normal +kernel code and in this situation we can jump directly to the kernel. The EL2 +case is used for hypervisor code, and this presented me with a challenge. When I +tested my bootloader in qemu-virt, it worked initially, but on real hardware it +failed. After much wailing and gnashing of teeth, the cause was found to be that +our bootloader was started in EL2 on real hardware, and EL1 on qemu-virt. qemu +can be configured to boot in EL2, which was crucial in debugging this problem, +via -M virt,virtualization=on. From this environment I was able to identify a +few important steps to drop to EL1 and into the kernel, though from the comments +you can probably ascertain that this process was not well-understood. I do have +a better understanding of it now than I did when this code was written, but the +code is still serviceable and I see no reason to change it at this stage. + +At this point, 14 days into the port, I successfully reached kmain on qemu-virt. +Some initial kernel porting work was done after this, but when I was prepared to +test it on real hardware I ran into this EL2 problem &mdash; the first kmain on +real hardware ran at T+18. + +That sums it up for the aarch64 EFI bootloader work. 24 days later the kernel +and userspace ports would be complete, and a couple of weeks after that it was +running on stage at FOSDEM. The next post will cover the kernel port (maybe more +than one post will be required, we'll see), and the final post will address the +userspace port and the inner workings of the slidedeck demo that was shown on +stage. Look forward to it, and thanks for reading!