logo

drewdevault.com

[mirror] blog and personal website of Drew DeVault git clone https://hacktivis.me/git/mirror/drewdevault.com.git

Announcing-annotations-for-sourcehut.md (11894B)


  1. ---
  2. date: 2019-07-08
  3. layout: post
  4. title: Announcing code annotations for SourceHut
  5. tags: ["announcement", "sourcehut"]
  6. ---
  7. Today I'm happy to announce that code annotations are now available for
  8. [SourceHut](https://sourcehut.org)! <img style="display: inline; height: 1.2rem"
  9. src="/img/party.png" /> These allow you to decorate your code with arbitrary
  10. links and markdown. The end result looks something like this:
  11. **NOTICE**: Annotations were ultimately removed from sourcehut.
  12. ![](https://sr.ht/w767.png)
  13. <small class="text-muted">
  14. <a href="https://sourcehut.org">SourceHut</a> is the "hacker's forge", a
  15. 100% open-source platform for hosting Git &amp; Mercurial repos, bug trackers,
  16. mailing lists, continuous integration, and more. No JavaScript required!
  17. </small>
  18. The annotations shown here are sourced from a JSON file which you can generate
  19. and upload during your CI process. It looks something like this:
  20. ```json
  21. {
  22. "98bc0394a2f15171fb113acb5a9286a7454f22e7": [
  23. {
  24. "type": "markdown",
  25. "lineno": 33,
  26. "title": "1 reference",
  27. "content": "- [../main.c:123](https://example.org)"
  28. },
  29. {
  30. "type": "link",
  31. "lineno": 38,
  32. "colno": 7,
  33. "len": 15,
  34. "to": "#L6"
  35. },
  36. ...
  37. ```
  38. You can probably infer from this that annotations are very powerful. Not only
  39. can you annotate your code's semantic elements to your heart's content, but you
  40. can also do exotic things we haven't thought of yet, for every programming
  41. language you can find a parser for.
  42. I'll be going into some detail on the thought process that went into this
  43. feature's design and implementation in a moment, but if you're just excited and
  44. want to try it out, here are a few interesting annotated repos to browse:
  45. - [~sircmpwn/scdoc][scdoc]: man page generator (C)
  46. - [~sircmpwn/aerc][aerc]: TUI email client (Go)
  47. - [~mcf/cproc][cproc]: C compiler (C)
  48. [scdoc]: https://git.sr.ht/~sircmpwn/scdoc/tree/master/src/main.c
  49. [aerc]: https://git.sr.ht/~sircmpwn/aerc/tree/master/widgets/msgviewer.go
  50. [cproc]: https://git.sr.ht/~mcf/cproc/tree/master/scan.c
  51. And here are the docs for generating your own: [annotations on
  52. git.sr.ht](https://man.sr.ht/git.sr.ht/annotations.md). Currently annotators are
  53. available for C and Go, and I intend to write another for Python. For the rest,
  54. I'll be relying on the community to put together annotators for their favorite
  55. programming languages, and to help me expand on the ones I've built.
  56. ## Design
  57. A lot of design thought went into this feature, but I knew one thing from the
  58. outset: I wanted to make a generic system that users could use to annotate their
  59. source code in any manner they chose. My friend Andrew Kelley (of
  60. [Zig](https://ziglang.org/) fame) once expressed to me his frustration with
  61. GitHub's refusal to implement syntax highlighting for "small" languages, citing
  62. a shortage of manpower. It's for this reason that it's important to me that
  63. SourceHut's open-source platform allows users large and small to volunteer to
  64. build the perfect integration for their needs - I don't scale alone[^1].
  65. [^1]: For the syntax highlighting problem, by the way, this is accomplished by using Pygments. Improvements to Pygments reach not only SourceHut, but a large community of projects, making the software ecosystem better for everyone.
  66. To get a head start for the most common use-cases - scanning source files and
  67. linking references and definitions together - the best approach was unclear. I
  68. spent a lot of time studying [ctags](http://ctags.sourceforge.net/), for
  69. example, which supports a huge set of programming languages, but unfortunately
  70. only finds definitions. I thought about combining this with another approach for
  71. finding references, but the only generic library with lots of parsers I'm aware
  72. of is [Pygments](http://pygments.org/), and I didn't necessarily want to bring
  73. Python into every user's CI process if they weren't already using it. That
  74. approach would also make it more difficult to customize the annotations for each
  75. language. Other options I considered were
  76. [cscope](http://cscope.sourceforge.net/) and
  77. [gtags](https://www.gnu.org/software/global/), but the former doesn't have many
  78. programming languages supported (making the tradeoff questionable), and the
  79. latter just uses Pygments anyway.
  80. So I decided: I'm going to write my own annotators for each language. Or at
  81. least the languages I use the most:
  82. - C, because I like it but also because
  83. [scdoc](https://git.sr.ht/~sircmpwn/scdoc) is the demo repo shown on the
  84. [SourceHut marketing page](https://sourcehut.org).
  85. - Python, because SourceHut is largely written in Python and using it to browse
  86. itself would be cool.
  87. - Go, because parts of SourceHut are written in it but also because I use it a
  88. lot for [my own projects](https://git.sr.ht/~sircmpwn/aerc). I also knew that
  89. Go had at least *some* first-class support for working with its AST - and boy
  90. was I in for a surprise.
  91. With these initial languages decided, let's turn to the implementations.
  92. ## Annotating C code
  93. I began with the C annotator, because I knew it would be the most difficult.
  94. There does not exist any widely available standalone C parsing library to
  95. provide C programs with access to an AST. There's LLVM, but I have a deeply held
  96. belief that programming language compiler and introspection tooling should be
  97. implemented in the language itself. So, I set about to write a C parser from
  98. scratch.
  99. Or, almost from scratch. There exist two standard POSIX tools for writing
  100. compilers with: [lex][lex] and [yacc][yacc], which are respectively a lexer
  101. generator and a compiler compiler. Additionally, there are [pre-fab lex and
  102. yacc files](http://www.quut.com/c/ANSI-C-grammar-y.html) which *mostly*
  103. implement the C11 standard grammar. However, C is [not a context-free
  104. language][context], so additional work was necessary to track typedefs and use
  105. them to change future tokens emitted by the scanner. A little more work was also
  106. necessary for keeping track of line and column numbers in the lexer. Overall,
  107. however, this was relatively easy, and in less than a day's work I had a fully
  108. functional C11 parser.
  109. [lex]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/lex.html
  110. [yacc]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/yacc.html
  111. [context]: https://eli.thegreenplace.net/2007/11/24/the-context-sensitivity-of-cs-grammar/
  112. However, my celebration was short-lived as I started to feed my parser C
  113. programs from the wild. The GNU C Compiler, GCC, implements many C extensions,
  114. and their use, while inadvisable, is extremely common. Not least of the
  115. offenders is glibc, and thus running my parser on any system with glibc headers
  116. installed would likely immediately run into syntax errors. GCC's extensions are
  117. not documented in the form of an addendum to the C specification, but rather as
  118. end-user documentation and a 15 million lines-of-code compiler for you to
  119. reverse engineer. It took me almost a week of frustration to get a parser which
  120. worked passably on a large subset of the C programs found in the wild, and I
  121. imagine I'll be dealing with GNU problems for years to come. Please don't use C
  122. extensions, folks.
  123. In any case, the result now works fairly well for a lot of programs, and I have
  124. plans on expanding it to integrate more nicely with build systems like meson.
  125. Check out the code here: [annotatec](https://git.sr.ht/~sircmpwn/annotatec). The
  126. features of the C annotator include:
  127. - Annotating function definitions with a list of files/linenos which call them
  128. - Linking function calls to the definition of that function
  129. In the future I intend to add support for linking to external symbols as well -
  130. for example, linking to the POSIX spec for functions specified by POSIX, or to
  131. the Linux man pages for Linux calls. It would also be pretty cool to support
  132. linking between related projects, so that wlroots calls in sway can be linked to
  133. their declarations in the wlroots repo.
  134. ## Annotating Go code
  135. The Go annotator was far easier. I started over my morning cup of coffee today
  136. and I was finished with the basics by lunch. Go has a bunch of support in the
  137. standard library for parsing and analyzing Go programs - I was very impressed:
  138. - [go/ast](https://golang.org/pkg/go/ast/)
  139. - [go/scanner](https://golang.org/pkg/go/scanner/)
  140. - [go/token](https://golang.org/pkg/go/token/)
  141. - [go/types](https://golang.org/pkg/go/types/)
  142. To support Go 1.12's go modules, the experimental (but good enough)
  143. [packages](https://godoc.org/golang.org/x/tools/go/packages) module is available
  144. as well. All of this is nicely summarized by a lovely document in the [golang
  145. examples repository](https://github.com/golang/example/tree/master/gotypes). The
  146. type checker is also available as a library, something which is less common even
  147. among languages with parsers-as-libraries, and allows for many features which
  148. would be very difficult without it. Nice work, Go!
  149. The [resulting annotator](https://git.sr.ht/~sircmpwn/annotatego) clocks in at
  150. just over 250 lines of code - compare that to the C annotator's ~1,300 lines of
  151. C, lex, and yacc source code. The Go annotator is more featureful, too, it can:
  152. - Link function calls to their definitions, and in reverse
  153. - Link method calls to their definitions, and in reverse
  154. - Link variables to their definitions, even in other files
  155. - Link to godoc for symbols defined in external packages
  156. I expect a lot more to be possible in the future. It might get noisy if you turn
  157. everything on, so each annotation type is gated behind a command line flag.
  158. ## Displaying annotations
  159. Displaying these annotations required a bit more effort than I would have liked,
  160. but the end result is fairly clean and reusable. Since SourceHut uses Pygments
  161. for syntax highlighting, I ended up writing a [custom
  162. Formatter](http://pygments.org/docs/formatterdevelopment/) based on the existing
  163. Pygments HtmlFormatter. The result is the [AnnotationFormatter][git.sr.ht
  164. formatter], which splices annotations into the highlighted code. One downside of
  165. this approach is that it works at the token level - a more sophisticated
  166. implementation will be necessary for annotations that span more than a single
  167. token. Annotations are fairly expensive to render, so the rendered HTML is
  168. stowed in Redis.
  169. [git.sr.ht formatter]: https://git.sr.ht/~sircmpwn/git.sr.ht/tree/master/gitsrht/annotations.py
  170. ## The future?
  171. I intend to write a Python annotator soon, and I'll be relying on the community
  172. to build more. If you're looking for a fun weekend hack and a chance to learn
  173. more about your favorite programming language, this'd be a great project. The
  174. format for annotations on SourceHut is also pretty generalizable, so I encourage
  175. other code forges to reuse it so that our annotators are useful on every code
  176. hosting platform.
  177. builds.sr.ht will also soon grow first-class support for making these annotators
  178. available to your build process, as well as for making an OAuth token available
  179. (ideally with a limited set of permissions) to your build environment. Rigging
  180. up an annotator is a bit involved today ([though the docs
  181. help](https://man.sr.ht/git.sr.ht/annotations.md)), and streamlining that
  182. process will be pretty helpful. Additionally, this feature is only available for
  183. git.sr.ht, though it should generalize to hg.sr.ht fairly easily and I hope
  184. we'll see it available there soon.
  185. I'm also looking forward to seeing more novel use-cases for annotation. Can we
  186. indicate code coverage by coloring a gutter alongside each line of code? Can we
  187. link references to ticket numbers in the comments to your bug tracker? If you
  188. have any cool ideas, I'm all ears. Here's that list of cool annotated repos to
  189. browse again, if you made it this far and want to check them out:
  190. - [~sircmpwn/scdoc][scdoc]: man page generator (C)
  191. - [~sircmpwn/aerc][aerc]: TUI email client (Go)
  192. - [~mcf/cproc][cproc]: C compiler (C)