logo

drewdevault.com

[mirror] blog and personal website of Drew DeVault git clone https://hacktivis.me/git/mirror/drewdevault.com.git

The-next-YAML.md (3949B)


  1. ---
  2. title: My wish-list for the next YAML
  3. date: 2021-07-28
  4. ---
  5. [YAML](http://yaml.org) is both universally used, and universally reviled. It
  6. has a lot of problems, but it also is so useful in solving specific tasks that
  7. it's hard to replace. Some new kids on the block (such as TOML) have
  8. successfully taken over a *portion* of its market share, but it remains in force
  9. in places where those alternatives show their weaknesses.
  10. I think it's clear to most that YAML is in dire need of replacement, which is
  11. why many have tried. But many have also failed. So what are the key features of
  12. YAML which demonstrate its strengths, and key weaknesses that could be improved
  13. upon?
  14. Let's start with some things that YAML does well, which will have to be
  15. preserved.
  16. - **Hierarchical relationships emphasized with whitespace**. There is no better
  17. way of representing a hierarchical data structure than by actually organizing
  18. your information visually. Note that semantically meaningful whitespace is not
  19. actually required — the use of tokens like { is acceptable — so
  20. long as, by convention, hierarchies are visually apparent.
  21. - **Defined independently of its implementation**. There should not be a
  22. canonical implementation of the format (though a reference implementation is,
  23. perhaps, acceptable). It should not be defined as "a config library for
  24. $language". Interoperability is key. It must have a specification.
  25. - **Easily embeds documents written in other formats**. This is the chief reason
  26. that YAML still dominates in CI configuration: the ability to trivially write
  27. scripts directly into config file, without escaping anything or otherwise
  28. molesting the script.
  29. ```yaml
  30. tasks:
  31. - configure: |
  32. jit_flags=""
  33. if [ "$(uname -m)" != "x86_64" ]
  34. then
  35. jit_flags=--without-jit
  36. fi
  37. ./configure \
  38. --prefix=/usr \
  39. $jit_flags
  40. - build: |
  41. make
  42. - test: |
  43. make check
  44. ```
  45. - **Both machine- and human-editable**. It's very useful for both humans and
  46. machines to collaborate on a YAML file. For instance, humans write build
  47. manifests for their git.sr.ht repos, and then the project hub adds steps to
  48. download and apply patches from mailing lists before submitting them to the
  49. build driver. For the human's part, the ability to easily embed scripts (see
  50. above) and write other config parameters conveniently is very helpful —
  51. everyone hates config.json.
  52. - **Not a programming language**. YAML entities are a problem, but we'll talk
  53. about that separately. In general, YAML files are not programs. They're just
  54. data. This is a good thing. If you want, you can use a *separate*
  55. pre-processor, like jsonnet.
  56. What needs to be improved upon?
  57. - **A much simpler grammar**. No more billion laughs, please. Besides this, 90%
  58. of YAML's features go un-used, which increases the complexity of
  59. implementations, not to mention their attack surface, for little reason.
  60. - **A means of defining a schema**, which can influence the interpretation of
  61. the input. YAML does this poorly. Consider the following YAML list:
  62. ```yaml
  63. items:
  64. - hello
  65. - 24
  66. - world
  67. ```
  68. Two of these are strings, and one is a number. Representing numbers and
  69. strings plainly like this makes it easier for humans to write, though
  70. requiring humans to write their values in a format which provides an
  71. unambiguous type is not so inconvenient as to save this trait from the cutting
  72. room floor. Leaving the ambiguity in place, without any redress, provides a
  73. major source of bugs in programs that consume YAML.
  74. - **I don't care about JSON interoperability**. Being a superset of JSON is
  75. mildly useful, but not so much so as to compromise any other features or
  76. design. I'm prepared to yeet it at the first sign of code smells.
  77. Someday I may design something like this myself, but I'm really hoping that
  78. someone else does it instead. Good luck!