logo

drewdevault.com

[mirror] blog and personal website of Drew DeVault git clone https://hacktivis.me/git/mirror/drewdevault.com.git

Anatomy-of-a-shell.md (5479B)


  1. ---
  2. date: 2018-12-28
  3. layout: post
  4. title: Anatomy of a shell
  5. tags: ["shell"]
  6. ---
  7. I've been contributing where I can to Simon Ser's [mrsh][mrsh] project, a
  8. work-in-progress strictly POSIX shell implementation. I worked on some small
  9. mrsh features during my holiday travels and it's in the forefront of my mind, so
  10. I'd like to share some of its design details with you.
  11. [mrsh]: https://git.sr.ht/~emersion/mrsh
  12. There are two main components to a shell: parsing and execution. mrsh uses a
  13. simple [recursive descent parser][rd-parser] to generate an AST (Abstract Syntax
  14. Tree, or an in-memory model of the structure of the parsed source). This design
  15. was chosen to simplify the code and avoid dependencies like flex/bison, and is a
  16. good choice given that performance isn't critical for parsing shell scripts.
  17. Here's an example of the input source and output AST:
  18. [rd-parser]: https://en.wikipedia.org/wiki/Recursive_descent_parser
  19. ```sh
  20. #!/bin/sh
  21. say_hello() {
  22. echo "hello $1!"
  23. }
  24. who=$(whoami)
  25. say_hello "$who"
  26. ```
  27. This script is parsed into this AST (this is the output of `mrsh -n test.sh`):
  28. ```
  29. program
  30. program
  31. └─command_list ─ pipeline
  32. └─function_definition say_hello ─ brace_group
  33. └─command_list ─ pipeline
  34. └─simple_command
  35. ├─name ─ word_string [3:2 → 3:6] echo
  36. └─argument 1 ─ word_list (quoted)
  37. ├─word_string [3:8 → 3:14] hello
  38. ├─word_parameter
  39. │ └─name 1
  40. └─word_string [3:16 → 3:17] !
  41. program
  42. program
  43. └─command_list ─ pipeline
  44. └─simple_command
  45. └─assignment
  46. ├─name who
  47. └─value ─ word_command ─ program
  48. └─command_list ─ pipeline
  49. └─simple_command
  50. └─name ─ word_string [6:7 → 6:13] whoami
  51. program
  52. └─command_list ─ pipeline
  53. └─simple_command
  54. ├─name ─ word_string [7:1 → 7:10] say_hello
  55. └─argument 1 ─ word_list (quoted)
  56. └─word_parameter
  57. └─name who
  58. ```
  59. Most of these names come directly from the [POSIX shell specification][spec].
  60. The parser and AST is made available as a standalone public interface of
  61. libmrsh, which can be used for a variety of use-cases like syntax-aware text
  62. editors, syntax highlighting (see [`highlight.c`][hl.c]), linters, etc. The most
  63. important use-case is, of course, task planning and execution.
  64. [spec]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html
  65. [hl.c]: https://git.sr.ht/~emersion/mrsh/tree/master/highlight.c
  66. Most of these AST nodes becomes a *task*. A task defines an implementation of
  67. the following interface:
  68. ```c
  69. struct task_interface {
  70. /**
  71. * Request a status update from the task. This starts or continues it.
  72. * `poll` must return without blocking with the current task's status:
  73. *
  74. * - TASK_STATUS_WAIT in case the task is pending
  75. * - TASK_STATUS_ERROR in case a fatal error occured
  76. * - A positive (or null) code in case the task finished
  77. *
  78. * `poll` will be called over and over until the task goes out of the
  79. * TASK_STATUS_WAIT state. Once the task is no longer in progress, the
  80. * returned state is cached and `poll` won't be called anymore.
  81. */
  82. int (*poll)(struct task *task, struct context *ctx);
  83. void (*destroy)(struct task *task);
  84. };
  85. ```
  86. Most of the time the task will just do its thing. Many tasks will have sub-tasks
  87. as well, such as a command list executing a list of commands, or each branch of
  88. an if statement, which it can defer to with `task_poll`. Many tasks will wait on
  89. an external process, in which case it can return TASK_STATUS_WAIT to have the
  90. process `wait`ed on. Feel free to browse the [full list of tasks][tasks] to get
  91. an idea.
  92. [tasks]: https://git.sr.ht/~emersion/mrsh/tree/master/shell/task
  93. One concern more specific to POSIX shells is built-in commands. Some commands
  94. have to be built-in because they manipulate the shell's state, such as `.` and
  95. `cd`. Others, like `true` & `false`, are there for performance reasons, since
  96. they're simple and easily implemented internally. POSIX specifies [a list of
  97. special builtins][builtins] which are necessary to implement in the shell
  98. itself. There's [a second list][utilities] that must be present for the shell
  99. environment to be considered POSIX compatible (plus some reserved names like
  100. `local` and `pushd` that invoke undefined behavior - mrsh aborts on these).
  101. [builtins]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_14
  102. [utilities]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_01_01
  103. Here are some links to more interesting parts of the code so you can explore on
  104. your own:
  105. - [Redirection](https://git.sr.ht/~emersion/mrsh/tree/master/shell/redir.c) & [pipelines](https://git.sr.ht/~emersion/mrsh/tree/master/shell/task/pipeline.c)
  106. - [Function definition](https://git.sr.ht/~emersion/mrsh/tree/master/shell/task/function_definition.c) & [execution](https://git.sr.ht/~emersion/mrsh/tree/master/shell/task/command_function.c)
  107. - [The . builtin](https://git.sr.ht/~emersion/mrsh/tree/master/builtin/dot.c)
  108. - [main.c and the REPL](https://git.sr.ht/~emersion/mrsh/tree/master/main.c)
  109. I might write more articles in the future diving into specific concepts, feel
  110. free to shoot me an email if you have suggestions. Shoutout to Simon for
  111. building such a cool project! I'm looking forward to contributing more until we
  112. have a really nice strictly POSIX shell.