logo

drewdevault.com

[mirror] blog and personal website of Drew DeVault git clone https://hacktivis.me/git/mirror/drewdevault.com.git

The-next-YAML.md (3974B)


  1. ---
  2. title: My wish-list for the next YAML
  3. date: 2021-07-28
  4. outputs: [html, gemtext]
  5. ---
  6. [YAML](http://yaml.org) is both universally used, and universally reviled. It
  7. has a lot of problems, but it also is so useful in solving specific tasks that
  8. it's hard to replace. Some new kids on the block (such as TOML) have
  9. successfully taken over a *portion* of its market share, but it remains in force
  10. in places where those alternatives show their weaknesses.
  11. I think it's clear to most that YAML is in dire need of replacement, which is
  12. why many have tried. But many have also failed. So what are the key features of
  13. YAML which demonstrate its strengths, and key weaknesses that could be improved
  14. upon?
  15. Let's start with some things that YAML does well, which will have to be
  16. preserved.
  17. - **Hierarchical relationships emphasized with whitespace**. There is no better
  18. way of representing a hierarchical data structure than by actually organizing
  19. your information visually. Note that semantically meaningful whitespace is not
  20. actually required — the use of tokens like { is acceptable — so
  21. long as, by convention, hierarchies are visually apparent.
  22. - **Defined independently of its implementation**. There should not be a
  23. canonical implementation of the format (though a reference implementation is,
  24. perhaps, acceptable). It should not be defined as "a config library for
  25. $language". Interoperability is key. It must have a specification.
  26. - **Easily embeds documents written in other formats**. This is the chief reason
  27. that YAML still dominates in CI configuration: the ability to trivially write
  28. scripts directly into config file, without escaping anything or otherwise
  29. molesting the script.
  30. ```yaml
  31. tasks:
  32. - configure: |
  33. jit_flags=""
  34. if [ "$(uname -m)" != "x86_64" ]
  35. then
  36. jit_flags=--without-jit
  37. fi
  38. ./configure \
  39. --prefix=/usr \
  40. $jit_flags
  41. - build: |
  42. make
  43. - test: |
  44. make check
  45. ```
  46. - **Both machine- and human-editable**. It's very useful for both humans and
  47. machines to collaborate on a YAML file. For instance, humans write build
  48. manifests for their git.sr.ht repos, and then the project hub adds steps to
  49. download and apply patches from mailing lists before submitting them to the
  50. build driver. For the human's part, the ability to easily embed scripts (see
  51. above) and write other config parameters conveniently is very helpful —
  52. everyone hates config.json.
  53. - **Not a programming language**. YAML entities are a problem, but we'll talk
  54. about that separately. In general, YAML files are not programs. They're just
  55. data. This is a good thing. If you want, you can use a *separate*
  56. pre-processor, like jsonnet.
  57. What needs to be improved upon?
  58. - **A much simpler grammar**. No more billion laughs, please. Besides this, 90%
  59. of YAML's features go un-used, which increases the complexity of
  60. implementations, not to mention their attack surface, for little reason.
  61. - **A means of defining a schema**, which can influence the interpretation of
  62. the input. YAML does this poorly. Consider the following YAML list:
  63. ```yaml
  64. items:
  65. - hello
  66. - 24
  67. - world
  68. ```
  69. Two of these are strings, and one is a number. Representing numbers and
  70. strings plainly like this makes it easier for humans to write, though
  71. requiring humans to write their values in a format which provides an
  72. unambiguous type is not so inconvenient as to save this trait from the cutting
  73. room floor. Leaving the ambiguity in place, without any redress, provides a
  74. major source of bugs in programs that consume YAML.
  75. - **I don't care about JSON interoperability**. Being a superset of JSON is
  76. mildly useful, but not so much so as to compromise any other features or
  77. design. I'm prepared to yeet it at the first sign of code smells.
  78. Someday I may design something like this myself, but I'm really hoping that
  79. someone else does it instead. Good luck!