When to comment that code - drewdevault.com - [mirror] blog and personal website of Drew DeVault

commit: ab7ff2821dcb9747596aea7f55af8cf415ca6453
parent 5b3d635865b12206a112946d2f2c1a79a77be6d8
Author: Drew DeVault <sir@cmpwn.com>
Date:   Thu,  9 Mar 2023 12:18:45 +0100

When to comment that code

Diffstat:
A content/blog/2023-03-09-Comment-or-no-comment.md 234 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

1 file changed, 234 insertions(+), 0 deletions(-)
diff --git a/content/blog/2023-03-09-Comment-or-no-comment.md b/content/blog/2023-03-09-Comment-or-no-comment.md
@@ -0,0 +1,234 @@
+---
+title: When to comment that code
+date: 2023-03-09
+---
+
+My software tends to have a surprisingly low number of comments. One of my
+projects, [scdoc], has 25 comments among its 1,133 lines of C code, or 2%,
+compared to the average of 19%.[^1] Naturally, I insist that my code is
+well-written in spite of this divergence from the norm. Allow me to explain.
+
+[scdoc]: https://git.sr.ht/~sircmpwn/scdoc
+[^1]: O. Arafat and D. Riehle, "The comment density of open source software code," 2009 31st International Conference on Software Engineering - Companion Volume, Vancouver, BC, Canada, 2009, pp. 195-198, doi: 10.1109/ICSE-COMPANION.2009.5070980.
+
+The philosophy and implementation of code comments varies widely in the
+industry, and some view comment density as a proxy for code quality.[^2] I'll
+state my views here, but will note that yours may differ and I find that
+acceptable; I am not here to suggest that your strategy is wrong and I will
+happily adopt it when I write a patch for your codebase.
+
+[^2]: I hold this view weakly, but reverse of the norm: I consider a high
+  comment density a sign that the code quality may be poor.
+
+Let's begin with an illustrative example from one of my projects:
+
+```hare
+// Reads the next entry from an EFI [[FILE_PROTOCOL]] handle of an open
+// directory. The return value is statically allocated and will be overwritten
+// on the next call.
+export fn readdir(dir: *FILE_PROTOCOL) (*FILE_INFO | void | error) = {
+	// size(FILE_INFO) plus reserve up to 512 bytes for file name (FAT32
+	// maximum, times two for wstr encoding)
+	static let buf: [FILE_INFO_SIZE + 512]u8 = [0...];
+	const n = read(dir, buf)?;
+	if (n == 0) {
+		return;
+	};
+	return &buf[0]: *FILE_INFO;
+};
+```
+
+This code illustrates two of my various approaches to writing comments. The
+first comment is a documentation comment: the intended audience is the consumer
+of this API. The call-site has access to the following information:
+
+- This comment
+- The name of the function, and the module in which it resides (efi::readdir)
+- The parameter names and types
+- The return type
+
+The goal is for the user of this function to gather enough information from
+these details to correctly utilize this API.
+
+The module in which it resides suggests that this function interacts with the
+EFI (Extensible Firmware Interface) standard, and the user would be wise to pair
+a reading of this code (or API) with skimming the relevant standard. Indeed, the
+strategic naming of the FILE\_PROTOCOL and FILE\_INFO types (notably written in
+defiance of the Hare style guide), provide hints to the relevant parts of the
+EFI specification to read for a complete understanding of this code.
+
+The name of the function is also carefully chosen to carry some weight: it is a
+reference to the Unix readdir function, which brings with it an intuition about
+its purpose and usage for programmers familiar with a Unix environment.
+
+The return type also provides hints about the function's use: it may return
+either a FILE\_INFO pointer, void (nothing), or an error. Without reading the
+documentation string, and taking the name and return type into account, we might
+(correctly) surmise that we need to call this function repeatedly to read file
+details out of a directory until it returns void, indicating that all entries
+have been processed, handling any errors which might occur along the way.
+
+We have established a lot of information about this function without actually
+reading the comment; in my philosophy of programming I view this information as
+a critical means for the author to communicate to the user, and we can lean on
+it to reduce the need for explicit documentation. Nevertheless, the
+documentation comment adds something here. The first sentence is a relatively
+information-sparse summary of the function's purpose, and mainly exists to tick
+a box in the Hare style guide.[^3] The second sentence is the only real reason
+this comment exists: to clarify an important detail for the user which is not
+apparent from the function signature, namely the storage semantics associated
+with the return value.
+
+[^3]: Which states that all exported functions that the module consumer is
+  expected to use should have a comment, and that exported but undocumented
+  symbols are exported to fulfill an implementation detail and not to provide a
+  useful interface.
+
+Let's now study the second comment's purpose:
+
+```hare
+// size(FILE_INFO) plus reserve up to 512 bytes for file name (FAT32
+// maximum, times two for wstr encoding)
+static let buf: [FILE_INFO_SIZE + 512]u8 = [0...];
+```
+
+This comment exists to explain the use of the magic constant of 512. The
+audience of this comment is someone reading the *implementation* of this
+function. This audience has access to a different context than the user of the
+function, for instance they are expected to have a more comprehensive knowledge
+of EFI and are *definitely* expected to be reading the specification to a much
+greater degree of detail. We can and should lean on that context to make our
+comments more concise and useful.
+
+An alternative writing which does not rely on this context, and which in
+my view is strictly worse, may look like the following:
+
+```hare
+// The FILE_INFO structure includes the file details plus a variable length
+// array for the filename. The underlying filesystem is always FAT32 per the
+// EFI specification, which has a maximum filename length of 256 characters. The
+// filename is encoded as a wide-string (UCS-2), which encodes two bytes per
+// character, and is not NUL-terminated, so we need to reserve up to 512 bytes
+// for the filename.
+static let buf: [FILE_INFO_SIZE + 512]u8 = [0...];
+```
+
+The target audience of this comment should have a reasonable understanding of
+EFI. We simply need to clarify that this constant is the FAT32 max filename
+length, times two to account for the wstr encoding, and our magic constant is
+sufficiently explained.
+
+Let's move on to another kind of comment I occasionally write: medium-length
+prose. These often appear at the start of a function or the start of a file and
+serve to add context to the implementation, to justify the code's existence or
+explain why it works. Another sample:
+
+```hare
+fn init_pagetables() void = {
+	// 0xFFFF0000xxxxxxxx - 0xFFFF0200xxxxxxxx: identity map
+	// 0xFFFF0200xxxxxxxx - 0xFFFF0400xxxxxxxx: identity map (dev)
+	// 0xFFFF8000xxxxxxxx - 0xFFFF8000xxxxxxxx: kernel image
+	//
+	// L0[0x000]    => L1_ident
+	// L0[0x004]    => L1_devident
+	// L1_ident[*]  => 1 GiB identity mappings
+	// L0[0x100]    => L1_kernel
+	// L1_kernel[0] => L2_kernel
+	// L2_kernel[0] => L3_kernel
+	// L3_kernel[0] => 4 KiB kernel pages
+	L0[0x000] = PT_TABLE | &L1_ident: uintptr | PT_AF;
+	L0[0x004] = PT_TABLE | &L1_devident: uintptr | PT_AF;
+	L0[0x100] = PT_TABLE | &L1_kernel: uintptr | PT_AF;
+	L1_kernel[0] = PT_TABLE | &L2_kernel: uintptr | PT_AF;
+	L2_kernel[0] = PT_TABLE | &L3_kernel: uintptr | PT_AF;
+	for (let i = 0u64; i < len(L1_ident): u64; i += 1) {
+		L1_ident[i] = PT_BLOCK | (i * 0x40000000): uintptr |
+			PT_NORMAL | PT_AF | PT_ISM | PT_RW;
+	};
+	for (let i = 0u64; i < len(L1_devident): u64; i += 1) {
+		L1_devident[i] = PT_BLOCK | (i * 0x40000000): uintptr |
+			PT_DEVICE | PT_AF | PT_ISM | PT_RW;
+	};
+};
+```
+
+This comment shares a trait with the previous example: its purpose, in part, is
+to justify magic constants. It explains the indices of the arrays by way of the
+desired address space, and a perceptive reader will notice that 1 GiB =
+1073741824 bytes = 0x40000000 bytes.
+
+To fully understand this, we must again consider the intended audience. This
+is an implementation comment, so the reader is an *implementer*. They will need
+to possess some familiarity with the behavior of page tables to be productive in
+this code, and they likely have the ARM manual up on their second monitor. This
+comment simply fills in the blanks for an informed reader.
+
+There are two additional kinds of comments I often write: TODO and XXX.
+
+A TODO comment indicates some important implementation deficiency; it *must* be
+addressed at some point in the future and generally indicates that the function
+does not meet its stated interface and is often accompanied by an assertion, or
+a link to a ticket on the bug tracker, or both.
+
+```hare
+assert(ep.send == null); // TODO: support multiple senders
+```
+
+This function should support multiple senders, but does not; an assertion here
+prevents the code from running under conditions it does not yet support and the
+TODO comment indicates that this should be addressed in the future. The target
+audience for this comment is someone who brings about these conditions and runs
+into the assertion failure.
+
+```hare
+fn memory_empty(mem: *memory) bool = {
+	// XXX: This O(n) linked list traversal is bad
+	let next = mem.next;
+	let pages = 0u;
+	for (next != FREELIST_END; pages += 1) {
+		const addr = mem.phys + (next * mem::PAGESIZE): uintptr;
+		const ptr = mem::phys_tokernel(addr): *uint;
+		next = *ptr;
+	};
+	return pages == mem.pages;
+};
+```
+
+Here we find an example of an XXX comment. This code is correct: it implements
+the function's interface perfectly. However, given its expected usage, a
+performance of O(n) is not great: this function is expected to be used in hot
+paths. This comment documents the deficiency, and provides a hint to a reader
+that might be profiling this code in regards to a possible improvement.
+
+One final example:
+
+```hare
+// Invalidates the TLB for a virtual address.
+export fn invalidate(virt: uintptr) void = {
+	// TODO: Notify other cores (XXX SMP)
+	invlpg(virt);
+};
+```
+
+This is an atypical usage of XXX, but one which I still occasionally reach for.
+Here we have a TODO comment which indicates a case which this code does not
+consider, but which must be addressed in the future: it will have to raise an
+IPI to get other cores to invalidate the affected virtual address. However, this
+is one of many changes which fall under a broader milestone of SMP support, and
+the "XXX SMP" comment is here to make it easy to grep through the codebase for
+any places which are known to require attention while implementing SMP support.
+An XXX comment is often written for the purpose of being easily found with grep.
+
+That sums up most of the common reasons I will write a comment in my software.
+Each comment is written considering a target audience and the context provided
+by the code in which it resides, and aims to avoid stating redundant information
+within these conditions. It's for this reason that my code is sparse on
+comments: I find the information outside of the comments equally important and
+aim to be concise such that a comment is not redundant with information found
+elsewhere.
+
+Hopefully this post inspired some thought in you, to consider your comments
+deliberately and to be more aware of your ability to communicate information in
+other ways. Even if you chose to write your comments more densely than I do, I
+hope you will take care to communicate well through other mediums in your code
+as well.