Helios aarch64 part 1 - drewdevault.com - [mirror] blog and personal website of Drew DeVault

commit: 5b3d635865b12206a112946d2f2c1a79a77be6d8
parent 0dafcf4c5c22faedf7f70d238aef302e4bcb653b
Author: Drew DeVault <sir@cmpwn.com>
Date:   Mon, 20 Feb 2023 15:16:52 +0100

Helios aarch64 part 1

Diffstat:
A content/blog/2023-02-20-Helios-aarch64.md 682 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

1 file changed, 682 insertions(+), 0 deletions(-)
diff --git a/content/blog/2023-02-20-Helios-aarch64.md b/content/blog/2023-02-20-Helios-aarch64.md
@@ -0,0 +1,682 @@
+---
+title: Porting Helios to aarch64 for my FOSDEM talk, part one
+date: 2023-02-20
+---
+
+[Helios] is a microkernel written in the [Hare] programming language, and the
+subject of a talk I did at FOSDEM earlier this month. You can watch the talk
+here if you like:
+
+[Helios]: https://sr.ht/~sircmpwn/helios
+[Hare]: https://harelang.org
+
+<iframe title="FOSDEM 2023: Introducing the Helios microkernel" src="https://spacepub.space/videos/embed/f6435a6c-34e0-4602-ad5d-f791643111ab" allowfullscreen="" sandbox="allow-same-origin allow-scripts allow-popups" width="560" height="315" frameborder="0"></iframe>
+
+A while ago I promised someone that I would not do any talks on Helios until I
+could present them from Helios itself, and at FOSDEM I made good on that
+promise: my talk was presented from a Raspberry Pi 4 running Helios. The kernel
+was originally designed for x86\_64 (though we were careful to avoid painting
+ourselves into any corners so that we could port it to more architectures later
+on), and I initially planned to write an Intel HD Graphics driver so that I
+could drive the projector from my laptop. But, after a few days spent trying to
+comprehend the IHD manuals, I decided it would be *much* easier to port the
+entire system to aarch64 and write a driver for the much-simpler RPi GPU
+instead. 42 days later the port was complete, and a week or so after that I
+successfully presented the talk at FOSDEM. In a series of blog posts, I will
+take a look at those 42 days of work and explain how the aarch64 port works.
+Today's post focuses on the bootloader.
+
+The Helios boot-up process is:
+
+1. Bootloader starts up and loads the kernel, then jumps to it
+2. The kernel configures the system and loads the init process
+3. Kernel provides runtime services to init (and any subsequent processes)
+
+In theory, the port to aarch64 would address these steps in order, but in
+practice step (2) relies heavily on the runtime services provided by step (3),
+so much of the work was ordered 1, 3, 2. This blog post focuses on part 1, I'll
+cover parts 2 and 3 and all of the fun problems they caused in later posts.
+
+In any case, the bootloader was the first step. Some basic changes to the build
+system established boot/+aarch64 as the aarch64 bootloader, and a simple
+qemu-specific ARM kernel was prepared which just gave a little "hello world" to
+demonstrate the multi-arch build system was working as intended. More build
+system refinements would come later, but it's off to the races from here.
+Targeting qemu's aarch64 virt platform was useful for most of the initial
+debugging and bring-up (and is generally useful at all times, as a much easier
+platform to debug than real hardware); the first tests on real hardware came
+much later.
+
+Booting up is a sore point on most systems. It involves a lot of arch-specific
+procedures, but also generally calls for custom binary formats and annoying
+things like disk drivers &mdash; which don't belong in a microkernel. So the
+Helios bootloaders are separated from the kernel proper, which is a simple ELF
+executable. The bootloader loads this ELF file into memory, configures a few
+simple things, then passes some information along to the kernel entry point. The
+bootloader's memory and other resources are hereafter abandoned and are later
+reclaimed for general use.
+
+On aarch64 the boot story is pretty abysmal, and I wanted to avoid adding the
+SoC-specific complexity which is endemic to the platform. Thus, two solutions
+are called for: [EFI] and [device trees]. At the bootloader level, EFI is the
+more important concern. For qemu-virt and Raspberry Pi, [edk2] is the
+free-software implementation of choice when it comes to EFI. The first order of
+business is producing an executable which can be loaded by EFI, which is, rather
+unfortunately, based on the Windows [COFF/PE32+] format. I took inspiration from
+Linux and made an disgusting EFI stub solution, which involves hand-writing a
+PE32+ header in assembly and doing some truly horrifying things with binutils to
+massage everything into order. Much of the header is lifted from Linux:
+
+[EFI]: https://uefi.org/specifications
+[device trees]: https://www.devicetree.org/specifications/
+[edk2]: https://github.com/tianocore/edk2
+[COFF/PE32+]: https://learn.microsoft.com/en-us/windows/win32/debug/pe-format
+
+```
+.section .text.head
+.global base
+base:
+.L_head:
+	/* DOS header */
+	.ascii "MZ"
+	.skip 58
+	.short .Lpe_header - .L_head
+	.align 4
+.Lpe_header:
+	.ascii "PE\0\0"
+	.short 0xAA64                              /* Machine = AARCH64 */
+	.short 2                                   /* NumberOfSections */
+	.long 0                                    /* TimeDateStamp */
+	.long 0                                    /* PointerToSymbolTable */
+	.long 0                                    /* NumberOfSymbols */
+	.short .Lsection_table - .Loptional_header /* SizeOfOptionalHeader */
+	/* Characteristics:
+	 * IMAGE_FILE_EXECUTABLE_IMAGE |
+	 * IMAGE_FILE_LINE_NUMS_STRIPPED |
+	 * IMAGE_FILE_DEBUG_STRIPPED */
+	.short 0x206
+.Loptional_header:
+	.short 0x20b                     /* Magic = PE32+ (64-bit) */
+	.byte 0x02                       /* MajorLinkerVersion */
+	.byte 0x14                       /* MinorLinkerVersion */
+	.long _data - .Lefi_header_end   /* SizeOfCode */
+	.long __pecoff_data_size         /* SizeOfInitializedData */
+	.long 0                          /* SizeOfUninitializedData */
+	.long _start - .L_head           /* AddressOfEntryPoint */
+	.long .Lefi_header_end - .L_head /* BaseOfCode */
+.Lextra_header:
+	.quad 0                          /* ImageBase */
+	.long 4096                       /* SectionAlignment */
+	.long 512                        /* FileAlignment */
+	.short 0                         /* MajorOperatingSystemVersion */
+	.short 0                         /* MinorOperatingSystemVersion */
+	.short 0                         /* MajorImageVersion */
+	.short 0                         /* MinorImageVersion */
+	.short 0                         /* MajorSubsystemVersion */
+	.short 0                         /* MinorSubsystemVersion */
+	.long 0                          /* Reserved */
+
+	.long _end - .L_head             /* SizeOfImage */
+
+	.long .Lefi_header_end - .L_head /* SizeOfHeaders */
+	.long 0                          /* CheckSum */
+	.short 10                        /* Subsystem = EFI application */
+	.short 0                         /* DLLCharacteristics */
+	.quad 0                          /* SizeOfStackReserve */
+	.quad 0                          /* SizeOfStackCommit */
+	.quad 0                          /* SizeOfHeapReserve */
+	.quad 0                          /* SizeOfHeapCommit */
+	.long 0                          /* LoaderFlags */
+	.long 6                          /* NumberOfRvaAndSizes */
+
+	.quad 0 /* Export table */
+	.quad 0 /* Import table */
+	.quad 0 /* Resource table */
+	.quad 0 /* Exception table */
+	.quad 0 /* Certificate table */
+	.quad 0 /* Base relocation table */
+
+.Lsection_table:
+	.ascii ".text\0\0\0"              /* Name */
+	.long _etext - .Lefi_header_end   /* VirtualSize */
+	.long .Lefi_header_end - .L_head  /* VirtualAddress */
+	.long _etext - .Lefi_header_end   /* SizeOfRawData */
+	.long .Lefi_header_end - .L_head  /* PointerToRawData */
+	.long 0                           /* PointerToRelocations */
+	.long 0                           /* PointerToLinenumbers */
+	.short 0                          /* NumberOfRelocations */
+	.short 0                          /* NumberOfLinenumbers */
+	/* IMAGE_SCN_CNT_CODE | IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_EXECUTE */
+	.long 0x60000020
+
+	.ascii ".data\0\0\0"        /* Name */
+	.long __pecoff_data_size    /* VirtualSize */
+	.long _data - .L_head       /* VirtualAddress */
+	.long __pecoff_data_rawsize /* SizeOfRawData */
+	.long _data - .L_head       /* PointerToRawData */
+	.long 0                     /* PointerToRelocations */
+	.long 0                     /* PointerToLinenumbers */
+	.short 0                    /* NumberOfRelocations */
+	.short 0                    /* NumberOfLinenumbers */
+	/* IMAGE_SCN_CNT_INITIALIZED_DATA | IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_WRITE */
+	.long 0xc0000040
+
+.balign 0x10000
+.Lefi_header_end:
+
+.global _start
+_start:
+	stp x0, x1, [sp, -16]!
+
+	adrp x0, base
+	add x0, x0, #:lo12:base
+	adrp x1, _DYNAMIC
+	add x1, x1, #:lo12:_DYNAMIC
+	bl relocate
+	cmp w0, #0
+	bne 0f
+
+	ldp x0, x1, [sp], 16
+
+	b bmain
+
+0:
+	/* relocation failed */
+	add sp, sp, -16
+	ret
+```
+
+The specific details about how any of this works are complex and unpleasant,
+I'll refer you to the spec if you're curious, and offer a general suggestion
+that cargo-culting my work here would be a lot easier than understanding it
+should you need to build something similar.[^1]
+
+[^1]: A cursory review of this code while writing this blog post draws my
+  attention to a few things that ought to be improved as well.
+
+Note the entry point for later; we store two arguments from EFI (x0 and x1) on
+the stack and eventually branch to bmain.
+
+This file is assisted by the linker script:
+
+```
+ENTRY(_start)
+OUTPUT_FORMAT(elf64-littleaarch64)
+
+SECTIONS {
+	/DISCARD/ : {
+		*(.rel.reloc)
+		*(.eh_frame)
+		*(.note.GNU-stack)
+		*(.interp)
+		*(.dynsym .dynstr .hash .gnu.hash)
+	}
+
+	. = 0xffff800000000000;
+
+	.text.head : {
+		_head = .;
+		KEEP(*(.text.head))
+	}
+
+	.text : ALIGN(64K) {
+		_text = .;
+		KEEP(*(.text))
+		*(.text.*)
+		. = ALIGN(16);
+		*(.got)
+	}
+
+	. = ALIGN(64K);
+	_etext = .;
+
+	.dynamic : {
+		*(.dynamic)
+	}
+
+	.data : ALIGN(64K) {
+		_data = .;
+		KEEP(*(.data))
+		*(.data.*)
+
+		/* Reserve page tables */
+		. = ALIGN(4K);
+		L0 = .;
+		. += 512 * 8;
+		L1_ident = .;
+		. += 512 * 8;
+		L1_devident = .;
+		. += 512 * 8;
+		L1_kernel = .;
+		. += 512 * 8;
+		L2_kernel = .;
+		. += 512 * 8;
+		L3_kernel = .;
+		. += 512 * 8;
+	}
+
+	.rela.text : {
+		*(.rela.text)
+		*(.rela.text*)
+	}
+	.rela.dyn : {
+		*(.rela.dyn)
+	}
+	.rela.plt : {
+		*(.rela.plt)
+	}
+	.rela.got : {
+		*(.rela.got)
+	}
+	.rela.data : {
+		*(.rela.data)
+		*(.rela.data*)
+	}
+
+	.pecoff_edata_padding : {
+		BYTE(0);
+		. = ALIGN(512);
+	}
+	__pecoff_data_rawsize = ABSOLUTE(. - _data);
+	_edata = .;
+
+	.bss : ALIGN(4K) {
+		KEEP(*(.bss))
+		*(.bss.*)
+		*(.dynbss)
+	}
+
+	. = ALIGN(64K);
+	__pecoff_data_size = ABSOLUTE(. - _data);
+	_end = .;
+}
+```
+
+Items of note here are the careful treatment of relocation sections
+(cargo-culted from earlier work on RISC-V with Hare; not actually necessary as
+qbe generates PIC for aarch64)[^2] and the extra symbols used to gather
+information for the PE32+ header. Padding is also added in the required places,
+and static aarch64 page tables are defined for later use.
+
+[^2]: PIC stands for "position independent code". EFI can load executables at
+  any location in memory and the code needs to be prepared to deal with that;
+  PIC is the tool we use for this purpose.
+
+This is built as a shared object, and the Makefile ~~mutilates~~ reformats the
+resulting ELF file to produce a PE32+ executable:
+
+```
+$(BOOT)/bootaa64.so: $(BOOT_OBJS) $(BOOT)/link.ld
+	$(LD) -Bsymbolic -shared --no-undefined \
+		-T $(BOOT)/link.ld \
+		$(BOOT_OBJS) \
+		-o $@
+
+$(BOOT)/bootaa64.efi: $(BOOT)/bootaa64.so
+	$(OBJCOPY) -Obinary \
+		-j .text.head -j .text -j .dynamic -j .data \
+		-j .pecoff_edata_padding \
+		-j .dynstr -j .dynsym \
+		-j .rel -j .rel.* -j .rel* \
+		-j .rela -j .rela.* -j .rela* \
+		$< $@
+```
+
+With all of this mess sorted, and the PE32+ entry point branching to bmain, we
+can finally enter some Hare code:
+
+```
+export fn bmain(
+	image_handle: efi::HANDLE,
+	systab: *efi::SYSTEM_TABLE,
+) efi::STATUS = {
+    // ...
+};
+```
+
+Getting just this far took 3 full days of work.
+
+Initially, the Hare code incorporated a lot of proof-of-concept work from Alexey
+Yerin's "carrot" kernel prototype for RISC-V, which also booted via EFI.
+Following the early bringing-up of the bootloader environment, this was
+refactored into a more robust and general-purpose EFI support layer for Helios,
+which will be applicable to future ports. You can review the EFI support
+module's haredocs [here](https://mirror.drewdevault.com/efi.html). The purpose
+of this module is to provide an idiomatic Hare-oriented interface to the EFI
+boot services, which the bootloader makes use of mainly to read files from the
+boot media and examine the system's memory map.
+
+Let's take a look at the first few lines of bmain:
+
+```
+efi::init(image_handle, systab)!;
+
+const eficons = eficons_init(systab);
+log::setcons(&eficons);
+log::printfln("Booting Helios aarch64 via EFI");
+
+if (readel() == el::EL3) {
+	log::printfln("Booting from EL3 is not supported");
+	return efi::STATUS::LOAD_ERROR;
+};
+
+let mem = allocator { ... };
+init_mmap(&mem);
+init_pagetables();
+```
+
+Significant build system overhauls were required such that Hare modules from
+the kernel like log (and, later, other modules like elf) could be incorporated
+into the bootloader, simplifying the process of implementing more complex
+bootloaders. The first call of note here is init\_mmap, which scans the EFI
+memory map and prepares a simple high-watermark allocator to be used by the
+bootloader to allocate memory for the kernel image and other items of interest.
+It's quite simple, it just finds the largest area of general-purpose memory and
+sets up an allocator with it:
+
+```
+// Loads the memory map from EFI and initializes a page allocator using the
+// largest area of physical memory.
+fn init_mmap(mem: *allocator) void = {
+	const iter = efi::iter_mmap()!;
+	let maxphys: uintptr = 0, maxpages = 0u64;
+	for (true) {
+		const desc = match (efi::mmap_next(&iter)) {
+		case let desc: *efi::MEMORY_DESCRIPTOR =>
+			yield desc;
+		case void =>
+			break;
+		};
+		if (desc.DescriptorType != efi::MEMORY_TYPE::CONVENTIONAL) {
+			continue;
+		};
+		if (desc.NumberOfPages > maxpages) {
+			maxphys = desc.PhysicalStart;
+			maxpages = desc.NumberOfPages;
+		};
+	};
+	assert(maxphys != 0, "No suitable memory area found for kernel loader");
+	assert(maxpages <= types::UINT_MAX);
+	pagealloc_init(mem, maxphys, maxpages: uint);
+};
+```
+
+init\_pagetables is next. This populates the page tables reserved by the linker
+with the desired higher-half memory map, illustrated in the comments shown here:
+
+```
+fn init_pagetables() void = {
+	// 0xFFFF0000xxxxxxxx - 0xFFFF0200xxxxxxxx: identity map
+	// 0xFFFF0200xxxxxxxx - 0xFFFF0400xxxxxxxx: identity map (dev)
+	// 0xFFFF8000xxxxxxxx - 0xFFFF8000xxxxxxxx: kernel image
+	//
+	// L0[0x000]    => L1_ident
+	// L0[0x004]    => L1_devident
+	// L1_ident[*]  => 1 GiB identity mappings
+	// L0[0x100]    => L1_kernel
+	// L1_kernel[0] => L2_kernel
+	// L2_kernel[0] => L3_kernel
+	// L3_kernel[0] => 4 KiB kernel pages
+	L0[0x000] = PT_TABLE | &L1_ident: uintptr | PT_AF;
+	L0[0x004] = PT_TABLE | &L1_devident: uintptr | PT_AF;
+	L0[0x100] = PT_TABLE | &L1_kernel: uintptr | PT_AF;
+	L1_kernel[0] = PT_TABLE | &L2_kernel: uintptr | PT_AF;
+	L2_kernel[0] = PT_TABLE | &L3_kernel: uintptr | PT_AF;
+	for (let i = 0u64; i < len(L1_ident): u64; i += 1) {
+		L1_ident[i] = PT_BLOCK | (i * 0x40000000): uintptr |
+			PT_NORMAL | PT_AF | PT_ISM | PT_RW;
+	};
+	for (let i = 0u64; i < len(L1_devident): u64; i += 1) {
+		L1_devident[i] = PT_BLOCK | (i * 0x40000000): uintptr |
+			PT_DEVICE | PT_AF | PT_ISM | PT_RW;
+	};
+};
+```
+
+In short, we want three larger memory regions to be available: an identity map,
+where physical memory addresses correlate 1:1 with virtual memory, an identity
+map configured for device MMIO (e.g. with caching disabled), and an area to load
+the kernel image. The first two are straightforward, they use uniform 1 GiB
+mappings to populate their respective page tables. The latter is slightly more
+complex, ultimately the kernel is loaded in 4 KiB pages so we need to set up
+intermediate page tables for that purpose.
+
+We cannot actually enable these page tables until we're finished making use of
+the EFI boot services &mdash; the EFI specification requires us to preserve the
+online memory map at this stage of affairs. However, this does lay the
+groundwork for the kernel loader: we have an allocator to provide pages of
+memory, and page tables to set up virtual memory mappings that can be activated
+once we're done with EFI. bmain thus proceeds with loading the kernel:
+
+```
+const kernel = match (efi::open("\\helios", efi::FILE_MODE::READ)) {
+case let file: *efi::FILE_PROTOCOL =>
+	yield file;
+case let err: efi::error =>
+	log::printfln("Error: no kernel found at /helios");
+	return err: efi::STATUS;
+};
+
+log::printfln("Load kernel /helios");
+const kentry = match (load(&mem, kernel)) {
+case let err: efi::error =>
+	return err: efi::STATUS;
+case let entry: uintptr =>
+	yield entry: *kentry;
+};
+efi::close(kernel)!;
+```
+
+The loader itself (the "load" function here) is a relatively straightforward ELF
+loader; if you've seen one you've seen them all. Nevertheless, you may browse it
+[online][0] if you so wish. The only item of note here is the function used for
+mapping kernel pages:
+
+[0]: https://git.sr.ht/~sircmpwn/helios/tree/02d0490487c7a0fb4b0367b95819e808b98f87fb/item/boot/%2Baarch64/loader.ha
+
+```
+// Maps a physical page into the kernel's virtual address space.
+fn kmmap(virt: uintptr, phys: uintptr, flags: uintptr) void = {
+	assert(virt & ~0x1ff000 == 0xffff800000000000: uintptr);
+	const offs = (virt >> 12) & 0x1ff;
+	L3_kernel[offs] = PT_PAGE | PT_NORMAL | PT_AF | PT_ISM | phys | flags;
+};
+```
+
+The assertion enforces a constraint which is implemented by our kernel linker
+script, namely that all loadable kernel program headers are located within the
+kernel's reserved address space. With this constraint in place, the
+implementation is simpler than many mmap implementations; we can assume that
+L3\_kernel is the correct page table and just load it up with the desired
+physical address and mapping flags.
+
+Following the kernel loader, the bootloader addresses other items of interest,
+such as loading the device tree and boot modules &mdash; which includes, for
+instance, the init process image and an initramfs. It also allocates & populates
+data structures with information which will be of later use to the kernel,
+including the memory map. This code is relatively straightforward and not
+particularly interesting; most of these processes takes advantage of the same
+straightforward Hare function:
+
+```
+// Loads a file into continuous pages of memory and returns its physical
+// address.
+fn load_file(
+	mem: *allocator,
+	file: *efi::FILE_PROTOCOL,
+) (uintptr | efi::error) = {
+	const info = efi::file_info(file)?;
+	const fsize = info.FileSize: size;
+	let npage = fsize / PAGESIZE;
+	if (fsize % PAGESIZE != 0) {
+		npage += 1;
+	};
+
+	let base: uintptr = 0;
+	for (let i = 0z; i < npage; i += 1) {
+		const phys = pagealloc(mem);
+		if (base == 0) {
+			base = phys;
+		};
+
+		const nbyte = if ((i + 1) * PAGESIZE > fsize) {
+			yield fsize % PAGESIZE;
+		} else {
+			yield PAGESIZE;
+		};
+		let dest = (phys: *[*]u8)[..nbyte];
+		const n = efi::read(file, dest)?;
+		assert(n == nbyte);
+	};
+
+	return base;
+};
+```
+
+It is not necessary to map these into virtual memory anywhere, the kernel later
+uses the identity-mapped physical memory region in the higher half to read
+them. Tasks of interest resume at the end of bmain:
+
+```
+efi::exit_boot_services();
+init_mmu();
+enter_kernel(kentry, ctx);
+```
+
+Once we exit boot services, we are free to configure the MMU according to our
+desired specifications and make good use of all of the work done earlier to
+prepare a kernel memory map. Thus, init\_mmu:
+
+```
+// Initializes the ARM MMU to our desired specifications. This should take place
+// *after* EFI boot services have exited because we're going to mess up the MMU
+// configuration that it depends on.
+fn init_mmu() void = {
+	// Disable MMU
+	const sctlr_el1 = rdsctlr_el1();
+	wrsctlr_el1(sctlr_el1 & ~SCTLR_EL1_M);
+
+	// Configure MAIR
+	const mair: u64 =
+		(0xFF << 0) | // Attr0: Normal memory; IWBWA, OWBWA, NTR
+		(0x00 << 8);  // Attr1: Device memory; nGnRnE, OSH
+	wrmair_el1(mair);
+
+	const tsz: u64 = 64 - 48;
+	const ips = rdtcr_el1() & TCR_EL1_IPS_MASK;
+	const tcr_el1: u64 =
+		TCR_EL1_IPS_42B_4T |	// 4 TiB IPS
+		TCR_EL1_TG1_4K |	// Higher half: 4K granule size
+		TCR_EL1_SH1_IS |	// Higher half: inner shareable
+		TCR_EL1_ORGN1_WB |	// Higher half: outer write-back
+		TCR_EL1_IRGN1_WB |	// Higher half: inner write-back
+		(tsz << TCR_EL1_T1SZ) |	// Higher half: 48 bits
+		TCR_EL1_TG0_4K |	// Lower half: 4K granule size
+		TCR_EL1_SH0_IS |	// Lower half: inner sharable
+		TCR_EL1_ORGN0_WB |	// Lower half: outer write-back
+		TCR_EL1_IRGN0_WB |	// Lower half: inner write-back
+		(tsz << TCR_EL1_T0SZ);	// Lower half: 48 bits
+	wrtcr_el1(tcr_el1);
+
+	// Load page tables
+	wrttbr0_el1(&L0[0]: uintptr);
+	wrttbr1_el1(&L0[0]: uintptr);
+	invlall();
+
+	// Enable MMU
+	const sctlr_el1: u64 =
+		SCTLR_EL1_M |		// Enable MMU
+		SCTLR_EL1_C |		// Enable cache
+		SCTLR_EL1_I |		// Enable instruction cache
+		SCTLR_EL1_SPAN |	// SPAN?
+		SCTLR_EL1_NTLSMD |	// NTLSMD?
+		SCTLR_EL1_LSMAOE |	// LSMAOE?
+		SCTLR_EL1_TSCXT |	// TSCXT?
+		SCTLR_EL1_ITD;		// ITD?
+	wrsctlr_el1(sctlr_el1);
+};
+```
+
+There are a lot of bits here! Figuring out which ones to enable or disable was a
+project in and of itself. One of the major challenges, funnily enough, was
+finding the correct ARM manual to reference to understand all of these
+registers. I'll save you some time and [link to it][1] directly, should you ever
+find yourself writing similar code. Some question marks in comments towards the
+end point out some flags that I'm still not sure about. The ARM CPU is *very*
+configurable and identifying the configuration that produces the desired
+behavior for a general-purpose kernel requires some effort.
+
+[1]: https://mirror.drewdevault.com/ARMARM.pdf
+
+After this function completes, the MMU is initialized and we are up and running
+with the kernel memory map we prepared earlier; the kernel is loaded in the
+higher half and the MMU is prepared to service it. So, we can jump to the kernel
+via enter\_kernel:
+
+```
+@noreturn fn enter_kernel(entry: *kentry, ctx: *bootctx) void = {
+	const el = readel();
+	switch (el) {
+	case el::EL0 =>
+		abort("Bootloader running in EL0, breaks EFI invariant");
+	case el::EL1 =>
+		// Can boot immediately
+		entry(ctx);
+	case el::EL2 =>
+		// Boot from EL2 => EL1
+		//
+		// This is the bare minimum necessary to get to EL1. Future
+		// improvements might be called for here if anyone wants to
+		// implement hardware virtualization on aarch64. Good luck to
+		// this future hacker.
+
+		// Enable EL1 access to the physical counter register
+		const cnt = rdcnthctl_el2();
+		wrcnthctl_el2(cnt | 0b11);
+
+		// Enable aarch64 in EL1 & SWIO, disable most other EL2 things
+		// Note: I bet someday I'll return to this line because of
+		// Problems
+		const hcr: u64 = (1 << 1) | (1 << 31);
+		wrhcr_el2(hcr);
+
+		// Set up SPSR for EL1
+		// XXX: Magic constant I have not bothered to understand
+		wrspsr_el2(0x3c4);
+
+		enter_el1(ctx, entry);
+	case el::EL3 =>
+		// Not supported, tested earlier on
+		abort("Unsupported boot configuration");
+	};
+};
+```
+
+Here we see the detritus from one of many battles I fought to port this kernel:
+the EL2 => EL1 transition. aarch64 has several "exception levels", which are
+semantically similar to the x86\_64 concept of protection rings. EL0 is used for
+userspace code, which is not applicable under these circumstances; an assertion
+sanity-checks this invariant. EL1 is the simplest case, this is used for normal
+kernel code and in this situation we can jump directly to the kernel. The EL2
+case is used for hypervisor code, and this presented me with a challenge. When I
+tested my bootloader in qemu-virt, it worked initially, but on real hardware it
+failed. After much wailing and gnashing of teeth, the cause was found to be that
+our bootloader was started in EL2 on real hardware, and EL1 on qemu-virt. qemu
+can be configured to boot in EL2, which was crucial in debugging this problem,
+via -M virt,virtualization=on. From this environment I was able to identify a
+few important steps to drop to EL1 and into the kernel, though from the comments
+you can probably ascertain that this process was not well-understood. I do have
+a better understanding of it now than I did when this code was written, but the
+code is still serviceable and I see no reason to change it at this stage.
+
+At this point, 14 days into the port, I successfully reached kmain on qemu-virt.
+Some initial kernel porting work was done after this, but when I was prepared to
+test it on real hardware I ran into this EL2 problem &mdash; the first kmain on
+real hardware ran at T+18.
+
+That sums it up for the aarch64 EFI bootloader work. 24 days later the kernel
+and userspace ports would be complete, and a couple of weeks after that it was
+running on stage at FOSDEM. The next post will cover the kernel port (maybe more
+than one post will be required, we'll see), and the final post will address the
+userspace port and the inner workings of the slidedeck demo that was shown on
+stage. Look forward to it, and thanks for reading!