logo

drewdevault.com

[mirror] blog and personal website of Drew DeVault git clone https://hacktivis.me/git/mirror/drewdevault.com.git

2023-03-09-Comment-or-no-comment.md (11017B)


  1. ---
  2. title: When to comment that code
  3. date: 2023-03-09
  4. ---
  5. My software tends to have a surprisingly low number of comments. One of my
  6. projects, [scdoc], has 25 comments among its 1,133 lines of C code, or 2%,
  7. compared to the average of 19%.[^1] Naturally, I insist that my code is
  8. well-written in spite of this divergence from the norm. Allow me to explain.
  9. [scdoc]: https://git.sr.ht/~sircmpwn/scdoc
  10. [^1]: O. Arafat and D. Riehle, "The comment density of open source software code," 2009 31st International Conference on Software Engineering - Companion Volume, Vancouver, BC, Canada, 2009, pp. 195-198, doi: 10.1109/ICSE-COMPANION.2009.5070980.
  11. The philosophy and implementation of code comments varies widely in the
  12. industry, and some view comment density as a proxy for code quality.[^2] I'll
  13. state my views here, but will note that yours may differ and I find that
  14. acceptable; I am not here to suggest that your strategy is wrong and I will
  15. happily adopt it when I write a patch for your codebase.
  16. [^2]: I hold this view weakly, but reverse of the norm: I consider a high
  17. comment density a sign that the code quality may be poor.
  18. Let's begin with an illustrative example from one of my projects:
  19. ```hare
  20. // Reads the next entry from an EFI [[FILE_PROTOCOL]] handle of an open
  21. // directory. The return value is statically allocated and will be overwritten
  22. // on the next call.
  23. export fn readdir(dir: *FILE_PROTOCOL) (*FILE_INFO | void | error) = {
  24. // size(FILE_INFO) plus reserve up to 512 bytes for file name (FAT32
  25. // maximum, times two for wstr encoding)
  26. static let buf: [FILE_INFO_SIZE + 512]u8 = [0...];
  27. const n = read(dir, buf)?;
  28. if (n == 0) {
  29. return;
  30. };
  31. return &buf[0]: *FILE_INFO;
  32. };
  33. ```
  34. This code illustrates two of my various approaches to writing comments. The
  35. first comment is a documentation comment: the intended audience is the consumer
  36. of this API. The call-site has access to the following information:
  37. - This comment
  38. - The name of the function, and the module in which it resides (efi::readdir)
  39. - The parameter names and types
  40. - The return type
  41. The goal is for the user of this function to gather enough information from
  42. these details to correctly utilize this API.
  43. The module in which it resides suggests that this function interacts with the
  44. EFI (Extensible Firmware Interface) standard, and the user would be wise to pair
  45. a reading of this code (or API) with skimming the relevant standard. Indeed, the
  46. strategic naming of the FILE\_PROTOCOL and FILE\_INFO types (notably written in
  47. defiance of the Hare style guide), provide hints to the relevant parts of the
  48. EFI specification to read for a complete understanding of this code.
  49. The name of the function is also carefully chosen to carry some weight: it is a
  50. reference to the Unix readdir function, which brings with it an intuition about
  51. its purpose and usage for programmers familiar with a Unix environment.
  52. The return type also provides hints about the function's use: it may return
  53. either a FILE\_INFO pointer, void (nothing), or an error. Without reading the
  54. documentation string, and taking the name and return type into account, we might
  55. (correctly) surmise that we need to call this function repeatedly to read file
  56. details out of a directory until it returns void, indicating that all entries
  57. have been processed, handling any errors which might occur along the way.
  58. We have established a lot of information about this function without actually
  59. reading the comment; in my philosophy of programming I view this information as
  60. a critical means for the author to communicate to the user, and we can lean on
  61. it to reduce the need for explicit documentation. Nevertheless, the
  62. documentation comment adds something here. The first sentence is a relatively
  63. information-sparse summary of the function's purpose, and mainly exists to tick
  64. a box in the Hare style guide.[^3] The second sentence is the only real reason
  65. this comment exists: to clarify an important detail for the user which is not
  66. apparent from the function signature, namely the storage semantics associated
  67. with the return value.
  68. [^3]: Which states that all exported functions that the module consumer is
  69. expected to use should have a comment, and that exported but undocumented
  70. symbols are exported to fulfill an implementation detail and not to provide a
  71. useful interface.
  72. Let's now study the second comment's purpose:
  73. ```hare
  74. // size(FILE_INFO) plus reserve up to 512 bytes for file name (FAT32
  75. // maximum, times two for wstr encoding)
  76. static let buf: [FILE_INFO_SIZE + 512]u8 = [0...];
  77. ```
  78. This comment exists to explain the use of the magic constant of 512. The
  79. audience of this comment is someone reading the *implementation* of this
  80. function. This audience has access to a different context than the user of the
  81. function, for instance they are expected to have a more comprehensive knowledge
  82. of EFI and are *definitely* expected to be reading the specification to a much
  83. greater degree of detail. We can and should lean on that context to make our
  84. comments more concise and useful.
  85. An alternative writing which does not rely on this context, and which in
  86. my view is strictly worse, may look like the following:
  87. ```hare
  88. // The FILE_INFO structure includes the file details plus a variable length
  89. // array for the filename. The underlying filesystem is always FAT32 per the
  90. // EFI specification, which has a maximum filename length of 256 characters. The
  91. // filename is encoded as a wide-string (UCS-2), which encodes two bytes per
  92. // character, and is not NUL-terminated, so we need to reserve up to 512 bytes
  93. // for the filename.
  94. static let buf: [FILE_INFO_SIZE + 512]u8 = [0...];
  95. ```
  96. The target audience of this comment should have a reasonable understanding of
  97. EFI. We simply need to clarify that this constant is the FAT32 max filename
  98. length, times two to account for the wstr encoding, and our magic constant is
  99. sufficiently explained.
  100. Let's move on to another kind of comment I occasionally write: medium-length
  101. prose. These often appear at the start of a function or the start of a file and
  102. serve to add context to the implementation, to justify the code's existence or
  103. explain why it works. Another sample:
  104. ```hare
  105. fn init_pagetables() void = {
  106. // 0xFFFF0000xxxxxxxx - 0xFFFF0200xxxxxxxx: identity map
  107. // 0xFFFF0200xxxxxxxx - 0xFFFF0400xxxxxxxx: identity map (dev)
  108. // 0xFFFF8000xxxxxxxx - 0xFFFF8000xxxxxxxx: kernel image
  109. //
  110. // L0[0x000] => L1_ident
  111. // L0[0x004] => L1_devident
  112. // L1_ident[*] => 1 GiB identity mappings
  113. // L0[0x100] => L1_kernel
  114. // L1_kernel[0] => L2_kernel
  115. // L2_kernel[0] => L3_kernel
  116. // L3_kernel[0] => 4 KiB kernel pages
  117. L0[0x000] = PT_TABLE | &L1_ident: uintptr | PT_AF;
  118. L0[0x004] = PT_TABLE | &L1_devident: uintptr | PT_AF;
  119. L0[0x100] = PT_TABLE | &L1_kernel: uintptr | PT_AF;
  120. L1_kernel[0] = PT_TABLE | &L2_kernel: uintptr | PT_AF;
  121. L2_kernel[0] = PT_TABLE | &L3_kernel: uintptr | PT_AF;
  122. for (let i = 0u64; i < len(L1_ident): u64; i += 1) {
  123. L1_ident[i] = PT_BLOCK | (i * 0x40000000): uintptr |
  124. PT_NORMAL | PT_AF | PT_ISM | PT_RW;
  125. };
  126. for (let i = 0u64; i < len(L1_devident): u64; i += 1) {
  127. L1_devident[i] = PT_BLOCK | (i * 0x40000000): uintptr |
  128. PT_DEVICE | PT_AF | PT_ISM | PT_RW;
  129. };
  130. };
  131. ```
  132. This comment shares a trait with the previous example: its purpose, in part, is
  133. to justify magic constants. It explains the indices of the arrays by way of the
  134. desired address space, and a perceptive reader will notice that 1 GiB =
  135. 1073741824 bytes = 0x40000000 bytes.
  136. To fully understand this, we must again consider the intended audience. This
  137. is an implementation comment, so the reader is an *implementer*. They will need
  138. to possess some familiarity with the behavior of page tables to be productive in
  139. this code, and they likely have the ARM manual up on their second monitor. This
  140. comment simply fills in the blanks for an informed reader.
  141. There are two additional kinds of comments I often write: TODO and XXX.
  142. A TODO comment indicates some important implementation deficiency; it *must* be
  143. addressed at some point in the future and generally indicates that the function
  144. does not meet its stated interface and is often accompanied by an assertion, or
  145. a link to a ticket on the bug tracker, or both.
  146. ```hare
  147. assert(ep.send == null); // TODO: support multiple senders
  148. ```
  149. This function should support multiple senders, but does not; an assertion here
  150. prevents the code from running under conditions it does not yet support and the
  151. TODO comment indicates that this should be addressed in the future. The target
  152. audience for this comment is someone who brings about these conditions and runs
  153. into the assertion failure.
  154. ```hare
  155. fn memory_empty(mem: *memory) bool = {
  156. // XXX: This O(n) linked list traversal is bad
  157. let next = mem.next;
  158. let pages = 0u;
  159. for (next != FREELIST_END; pages += 1) {
  160. const addr = mem.phys + (next * mem::PAGESIZE): uintptr;
  161. const ptr = mem::phys_tokernel(addr): *uint;
  162. next = *ptr;
  163. };
  164. return pages == mem.pages;
  165. };
  166. ```
  167. Here we find an example of an XXX comment. This code is correct: it implements
  168. the function's interface perfectly. However, given its expected usage, a
  169. performance of O(n) is not great: this function is expected to be used in hot
  170. paths. This comment documents the deficiency, and provides a hint to a reader
  171. that might be profiling this code in regards to a possible improvement.
  172. One final example:
  173. ```hare
  174. // Invalidates the TLB for a virtual address.
  175. export fn invalidate(virt: uintptr) void = {
  176. // TODO: Notify other cores (XXX SMP)
  177. invlpg(virt);
  178. };
  179. ```
  180. This is an atypical usage of XXX, but one which I still occasionally reach for.
  181. Here we have a TODO comment which indicates a case which this code does not
  182. consider, but which must be addressed in the future: it will have to raise an
  183. IPI to get other cores to invalidate the affected virtual address. However, this
  184. is one of many changes which fall under a broader milestone of SMP support, and
  185. the "XXX SMP" comment is here to make it easy to grep through the codebase for
  186. any places which are known to require attention while implementing SMP support.
  187. An XXX comment is often written for the purpose of being easily found with grep.
  188. That sums up most of the common reasons I will write a comment in my software.
  189. Each comment is written considering a target audience and the context provided
  190. by the code in which it resides, and aims to avoid stating redundant information
  191. within these conditions. It's for this reason that my code is sparse on
  192. comments: I find the information outside of the comments equally important and
  193. aim to be concise such that a comment is not redundant with information found
  194. elsewhere.
  195. Hopefully this post inspired some thought in you, to consider your comments
  196. deliberately and to be more aware of your ability to communicate information in
  197. other ways. Even if you chose to write your comments more densely than I do, I
  198. hope you will take care to communicate well through other mediums in your code
  199. as well.