Announcing-annotations-for-sourcehut.md (11894B)
- ---
- date: 2019-07-08
- layout: post
- title: Announcing code annotations for SourceHut
- tags: ["announcement", "sourcehut"]
- ---
- Today I'm happy to announce that code annotations are now available for
- [SourceHut](https://sourcehut.org)! <img style="display: inline; height: 1.2rem"
- src="/img/party.png" /> These allow you to decorate your code with arbitrary
- links and markdown. The end result looks something like this:
- **NOTICE**: Annotations were ultimately removed from sourcehut.
- ![](https://sr.ht/w767.png)
- <small class="text-muted">
- <a href="https://sourcehut.org">SourceHut</a> is the "hacker's forge", a
- 100% open-source platform for hosting Git & Mercurial repos, bug trackers,
- mailing lists, continuous integration, and more. No JavaScript required!
- </small>
- The annotations shown here are sourced from a JSON file which you can generate
- and upload during your CI process. It looks something like this:
- ```json
- {
- "98bc0394a2f15171fb113acb5a9286a7454f22e7": [
- {
- "type": "markdown",
- "lineno": 33,
- "title": "1 reference",
- "content": "- [../main.c:123](https://example.org)"
- },
- {
- "type": "link",
- "lineno": 38,
- "colno": 7,
- "len": 15,
- "to": "#L6"
- },
- ...
- ```
- You can probably infer from this that annotations are very powerful. Not only
- can you annotate your code's semantic elements to your heart's content, but you
- can also do exotic things we haven't thought of yet, for every programming
- language you can find a parser for.
- I'll be going into some detail on the thought process that went into this
- feature's design and implementation in a moment, but if you're just excited and
- want to try it out, here are a few interesting annotated repos to browse:
- - [~sircmpwn/scdoc][scdoc]: man page generator (C)
- - [~sircmpwn/aerc][aerc]: TUI email client (Go)
- - [~mcf/cproc][cproc]: C compiler (C)
- [scdoc]: https://git.sr.ht/~sircmpwn/scdoc/tree/master/src/main.c
- [aerc]: https://git.sr.ht/~sircmpwn/aerc/tree/master/widgets/msgviewer.go
- [cproc]: https://git.sr.ht/~mcf/cproc/tree/master/scan.c
- And here are the docs for generating your own: [annotations on
- git.sr.ht](https://man.sr.ht/git.sr.ht/annotations.md). Currently annotators are
- available for C and Go, and I intend to write another for Python. For the rest,
- I'll be relying on the community to put together annotators for their favorite
- programming languages, and to help me expand on the ones I've built.
- ## Design
- A lot of design thought went into this feature, but I knew one thing from the
- outset: I wanted to make a generic system that users could use to annotate their
- source code in any manner they chose. My friend Andrew Kelley (of
- [Zig](https://ziglang.org/) fame) once expressed to me his frustration with
- GitHub's refusal to implement syntax highlighting for "small" languages, citing
- a shortage of manpower. It's for this reason that it's important to me that
- SourceHut's open-source platform allows users large and small to volunteer to
- build the perfect integration for their needs - I don't scale alone[^1].
- [^1]: For the syntax highlighting problem, by the way, this is accomplished by using Pygments. Improvements to Pygments reach not only SourceHut, but a large community of projects, making the software ecosystem better for everyone.
- To get a head start for the most common use-cases - scanning source files and
- linking references and definitions together - the best approach was unclear. I
- spent a lot of time studying [ctags](http://ctags.sourceforge.net/), for
- example, which supports a huge set of programming languages, but unfortunately
- only finds definitions. I thought about combining this with another approach for
- finding references, but the only generic library with lots of parsers I'm aware
- of is [Pygments](http://pygments.org/), and I didn't necessarily want to bring
- Python into every user's CI process if they weren't already using it. That
- approach would also make it more difficult to customize the annotations for each
- language. Other options I considered were
- [cscope](http://cscope.sourceforge.net/) and
- [gtags](https://www.gnu.org/software/global/), but the former doesn't have many
- programming languages supported (making the tradeoff questionable), and the
- latter just uses Pygments anyway.
- So I decided: I'm going to write my own annotators for each language. Or at
- least the languages I use the most:
- - C, because I like it but also because
- [scdoc](https://git.sr.ht/~sircmpwn/scdoc) is the demo repo shown on the
- [SourceHut marketing page](https://sourcehut.org).
- - Python, because SourceHut is largely written in Python and using it to browse
- itself would be cool.
- - Go, because parts of SourceHut are written in it but also because I use it a
- lot for [my own projects](https://git.sr.ht/~sircmpwn/aerc). I also knew that
- Go had at least *some* first-class support for working with its AST - and boy
- was I in for a surprise.
- With these initial languages decided, let's turn to the implementations.
- ## Annotating C code
- I began with the C annotator, because I knew it would be the most difficult.
- There does not exist any widely available standalone C parsing library to
- provide C programs with access to an AST. There's LLVM, but I have a deeply held
- belief that programming language compiler and introspection tooling should be
- implemented in the language itself. So, I set about to write a C parser from
- scratch.
- Or, almost from scratch. There exist two standard POSIX tools for writing
- compilers with: [lex][lex] and [yacc][yacc], which are respectively a lexer
- generator and a compiler compiler. Additionally, there are [pre-fab lex and
- yacc files](http://www.quut.com/c/ANSI-C-grammar-y.html) which *mostly*
- implement the C11 standard grammar. However, C is [not a context-free
- language][context], so additional work was necessary to track typedefs and use
- them to change future tokens emitted by the scanner. A little more work was also
- necessary for keeping track of line and column numbers in the lexer. Overall,
- however, this was relatively easy, and in less than a day's work I had a fully
- functional C11 parser.
- [lex]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/lex.html
- [yacc]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/yacc.html
- [context]: https://eli.thegreenplace.net/2007/11/24/the-context-sensitivity-of-cs-grammar/
- However, my celebration was short-lived as I started to feed my parser C
- programs from the wild. The GNU C Compiler, GCC, implements many C extensions,
- and their use, while inadvisable, is extremely common. Not least of the
- offenders is glibc, and thus running my parser on any system with glibc headers
- installed would likely immediately run into syntax errors. GCC's extensions are
- not documented in the form of an addendum to the C specification, but rather as
- end-user documentation and a 15 million lines-of-code compiler for you to
- reverse engineer. It took me almost a week of frustration to get a parser which
- worked passably on a large subset of the C programs found in the wild, and I
- imagine I'll be dealing with GNU problems for years to come. Please don't use C
- extensions, folks.
- In any case, the result now works fairly well for a lot of programs, and I have
- plans on expanding it to integrate more nicely with build systems like meson.
- Check out the code here: [annotatec](https://git.sr.ht/~sircmpwn/annotatec). The
- features of the C annotator include:
- - Annotating function definitions with a list of files/linenos which call them
- - Linking function calls to the definition of that function
- In the future I intend to add support for linking to external symbols as well -
- for example, linking to the POSIX spec for functions specified by POSIX, or to
- the Linux man pages for Linux calls. It would also be pretty cool to support
- linking between related projects, so that wlroots calls in sway can be linked to
- their declarations in the wlroots repo.
- ## Annotating Go code
- The Go annotator was far easier. I started over my morning cup of coffee today
- and I was finished with the basics by lunch. Go has a bunch of support in the
- standard library for parsing and analyzing Go programs - I was very impressed:
- - [go/ast](https://golang.org/pkg/go/ast/)
- - [go/scanner](https://golang.org/pkg/go/scanner/)
- - [go/token](https://golang.org/pkg/go/token/)
- - [go/types](https://golang.org/pkg/go/types/)
- To support Go 1.12's go modules, the experimental (but good enough)
- [packages](https://godoc.org/golang.org/x/tools/go/packages) module is available
- as well. All of this is nicely summarized by a lovely document in the [golang
- examples repository](https://github.com/golang/example/tree/master/gotypes). The
- type checker is also available as a library, something which is less common even
- among languages with parsers-as-libraries, and allows for many features which
- would be very difficult without it. Nice work, Go!
- The [resulting annotator](https://git.sr.ht/~sircmpwn/annotatego) clocks in at
- just over 250 lines of code - compare that to the C annotator's ~1,300 lines of
- C, lex, and yacc source code. The Go annotator is more featureful, too, it can:
- - Link function calls to their definitions, and in reverse
- - Link method calls to their definitions, and in reverse
- - Link variables to their definitions, even in other files
- - Link to godoc for symbols defined in external packages
- I expect a lot more to be possible in the future. It might get noisy if you turn
- everything on, so each annotation type is gated behind a command line flag.
- ## Displaying annotations
- Displaying these annotations required a bit more effort than I would have liked,
- but the end result is fairly clean and reusable. Since SourceHut uses Pygments
- for syntax highlighting, I ended up writing a [custom
- Formatter](http://pygments.org/docs/formatterdevelopment/) based on the existing
- Pygments HtmlFormatter. The result is the [AnnotationFormatter][git.sr.ht
- formatter], which splices annotations into the highlighted code. One downside of
- this approach is that it works at the token level - a more sophisticated
- implementation will be necessary for annotations that span more than a single
- token. Annotations are fairly expensive to render, so the rendered HTML is
- stowed in Redis.
- [git.sr.ht formatter]: https://git.sr.ht/~sircmpwn/git.sr.ht/tree/master/gitsrht/annotations.py
- ## The future?
- I intend to write a Python annotator soon, and I'll be relying on the community
- to build more. If you're looking for a fun weekend hack and a chance to learn
- more about your favorite programming language, this'd be a great project. The
- format for annotations on SourceHut is also pretty generalizable, so I encourage
- other code forges to reuse it so that our annotators are useful on every code
- hosting platform.
- builds.sr.ht will also soon grow first-class support for making these annotators
- available to your build process, as well as for making an OAuth token available
- (ideally with a limited set of permissions) to your build environment. Rigging
- up an annotator is a bit involved today ([though the docs
- help](https://man.sr.ht/git.sr.ht/annotations.md)), and streamlining that
- process will be pretty helpful. Additionally, this feature is only available for
- git.sr.ht, though it should generalize to hg.sr.ht fairly easily and I hope
- we'll see it available there soon.
- I'm also looking forward to seeing more novel use-cases for annotation. Can we
- indicate code coverage by coloring a gutter alongside each line of code? Can we
- link references to ticket numbers in the comments to your bug tracker? If you
- have any cool ideas, I'm all ears. Here's that list of cool annotated repos to
- browse again, if you made it this far and want to check them out:
- - [~sircmpwn/scdoc][scdoc]: man page generator (C)
- - [~sircmpwn/aerc][aerc]: TUI email client (Go)
- - [~mcf/cproc][cproc]: C compiler (C)