Anatomy-of-a-shell.md (5479B)
- ---
- date: 2018-12-28
- layout: post
- title: Anatomy of a shell
- tags: ["shell"]
- ---
- I've been contributing where I can to Simon Ser's [mrsh][mrsh] project, a
- work-in-progress strictly POSIX shell implementation. I worked on some small
- mrsh features during my holiday travels and it's in the forefront of my mind, so
- I'd like to share some of its design details with you.
- [mrsh]: https://git.sr.ht/~emersion/mrsh
- There are two main components to a shell: parsing and execution. mrsh uses a
- simple [recursive descent parser][rd-parser] to generate an AST (Abstract Syntax
- Tree, or an in-memory model of the structure of the parsed source). This design
- was chosen to simplify the code and avoid dependencies like flex/bison, and is a
- good choice given that performance isn't critical for parsing shell scripts.
- Here's an example of the input source and output AST:
- [rd-parser]: https://en.wikipedia.org/wiki/Recursive_descent_parser
- ```sh
- #!/bin/sh
- say_hello() {
- echo "hello $1!"
- }
- who=$(whoami)
- say_hello "$who"
- ```
- This script is parsed into this AST (this is the output of `mrsh -n test.sh`):
- ```
- program
- program
- └─command_list ─ pipeline
- └─function_definition say_hello ─ brace_group
- └─command_list ─ pipeline
- └─simple_command
- ├─name ─ word_string [3:2 → 3:6] echo
- └─argument 1 ─ word_list (quoted)
- ├─word_string [3:8 → 3:14] hello
- ├─word_parameter
- │ └─name 1
- └─word_string [3:16 → 3:17] !
- program
- program
- └─command_list ─ pipeline
- └─simple_command
- └─assignment
- ├─name who
- └─value ─ word_command ─ program
- └─command_list ─ pipeline
- └─simple_command
- └─name ─ word_string [6:7 → 6:13] whoami
- program
- └─command_list ─ pipeline
- └─simple_command
- ├─name ─ word_string [7:1 → 7:10] say_hello
- └─argument 1 ─ word_list (quoted)
- └─word_parameter
- └─name who
- ```
- Most of these names come directly from the [POSIX shell specification][spec].
- The parser and AST is made available as a standalone public interface of
- libmrsh, which can be used for a variety of use-cases like syntax-aware text
- editors, syntax highlighting (see [`highlight.c`][hl.c]), linters, etc. The most
- important use-case is, of course, task planning and execution.
- [spec]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html
- [hl.c]: https://git.sr.ht/~emersion/mrsh/tree/master/highlight.c
- Most of these AST nodes becomes a *task*. A task defines an implementation of
- the following interface:
- ```c
- struct task_interface {
- /**
- * Request a status update from the task. This starts or continues it.
- * `poll` must return without blocking with the current task's status:
- *
- * - TASK_STATUS_WAIT in case the task is pending
- * - TASK_STATUS_ERROR in case a fatal error occured
- * - A positive (or null) code in case the task finished
- *
- * `poll` will be called over and over until the task goes out of the
- * TASK_STATUS_WAIT state. Once the task is no longer in progress, the
- * returned state is cached and `poll` won't be called anymore.
- */
- int (*poll)(struct task *task, struct context *ctx);
- void (*destroy)(struct task *task);
- };
- ```
- Most of the time the task will just do its thing. Many tasks will have sub-tasks
- as well, such as a command list executing a list of commands, or each branch of
- an if statement, which it can defer to with `task_poll`. Many tasks will wait on
- an external process, in which case it can return TASK_STATUS_WAIT to have the
- process `wait`ed on. Feel free to browse the [full list of tasks][tasks] to get
- an idea.
- [tasks]: https://git.sr.ht/~emersion/mrsh/tree/master/shell/task
- One concern more specific to POSIX shells is built-in commands. Some commands
- have to be built-in because they manipulate the shell's state, such as `.` and
- `cd`. Others, like `true` & `false`, are there for performance reasons, since
- they're simple and easily implemented internally. POSIX specifies [a list of
- special builtins][builtins] which are necessary to implement in the shell
- itself. There's [a second list][utilities] that must be present for the shell
- environment to be considered POSIX compatible (plus some reserved names like
- `local` and `pushd` that invoke undefined behavior - mrsh aborts on these).
- [builtins]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_14
- [utilities]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_01_01
- Here are some links to more interesting parts of the code so you can explore on
- your own:
- - [Redirection](https://git.sr.ht/~emersion/mrsh/tree/master/shell/redir.c) & [pipelines](https://git.sr.ht/~emersion/mrsh/tree/master/shell/task/pipeline.c)
- - [Function definition](https://git.sr.ht/~emersion/mrsh/tree/master/shell/task/function_definition.c) & [execution](https://git.sr.ht/~emersion/mrsh/tree/master/shell/task/command_function.c)
- - [The . builtin](https://git.sr.ht/~emersion/mrsh/tree/master/builtin/dot.c)
- - [main.c and the REPL](https://git.sr.ht/~emersion/mrsh/tree/master/main.c)
- I might write more articles in the future diving into specific concepts, feel
- free to shoot me an email if you have suggestions. Shoutout to Simon for
- building such a cool project! I'm looking forward to contributing more until we
- have a really nice strictly POSIX shell.