logo

drewdevault.com

[mirror] blog and personal website of Drew DeVault git clone https://hacktivis.me/git/mirror/drewdevault.com.git

The-rc-shell-and-whitespace.md (5818B)


  1. ---
  2. title: The rc shell and its excellent handling of whitespace
  3. date: 2023-07-31
  4. ---
  5. *This blog post is a response to Mark Dominus' "[The shell and its crappy handling of whitespace](https://blog.plover.com/Unix/whitespace.html)"*.
  6. I've been working on a shell for Unix-like systems called
  7. [rc](https://git.sr.ht/~sircmpwn/rc), which draws heavily from the Plan 9 shell
  8. [of the same name](http://man.9front.org/1/rc). When I saw Mark's post about the
  9. perils of whitespace in POSIX shells (or derived shells, like bash), I thought
  10. it prudent to see if any of the problems he outlines are present in the shell
  11. I'm working on myself. Good news: they aren't!
  12. Let's go over each of his examples. First he provides the following example:
  13. ```
  14. for i in *.jpg; do
  15. cp $i /tmp
  16. done
  17. ```
  18. This breaks if there are spaces in the filenames. Not so with rc:
  19. ```
  20. % cat test.rc
  21. for (i in *.jpg) {
  22. cp $i subdir
  23. }
  24. % ls
  25. a.jpg b.jpg 'bite me.jpg' c.jpg subdir test.rc
  26. % rc ./test.rc
  27. % ls subdir/
  28. a.jpg b.jpg 'bite me.jpg' c.jpg
  29. ```
  30. He gives a similar example for a script that renames jpeg to jpg:
  31. ```
  32. for i in *.jpeg; do
  33. mv $i $(suf $i).jpg
  34. done
  35. ```
  36. This breaks for similar reasons, but works fine in rc:
  37. ```
  38. % cat test.rc
  39. fn suf(fname) {
  40. echo $fname | sed -e 's/\..*//'
  41. }
  42. for (i in *.jpeg) {
  43. mv $i `{suf $i}.jpg
  44. }
  45. % ls
  46. a.jpeg b.jpeg 'bite me.jpeg' c.jpeg test.rc
  47. % rc ./test.rc
  48. % ls
  49. a.jpg b.jpg 'bite me.jpg' c.jpg test.rc
  50. ```
  51. There are other shells, such as fish or zsh, which also have answers to these
  52. problems which don't necessarily call for generous quoting like other shells
  53. often do. rc is much simpler than these shells. At the moment it clocks in at
  54. just over 3,000 lines of code, compared to fish at ~45,000 and zsh at ~144,000.
  55. Admittedly, it's not done yet, but I would be surprised to see it grow beyond
  56. 5,000 lines for version 1.0.[^1]
  57. [^1]: Also worth noting that these line counts are, to some extent, comparing
  58. apples to oranges given that fish, zsh, and rc are written respectively in
  59. C++/Rust, C, and Hare.
  60. The key to rc's design success in this area is the introduction of a second
  61. primitive. The Bourne shell and its derivatives traditionally work with only one
  62. primitive: strings. But command lines are made of *lists* of strings, and so a
  63. language which embodies the primitives of the command line ought to also be able
  64. to represent those as a first-class feature. In traditional shells a list of
  65. strings is denoted inline with the use of spaces within those strings, which
  66. raises obvious problems when the members themselves contain spaces; see Mark's
  67. post detailing the errors which ensue. rc adds lists of strings as a formal
  68. primitive alongside strings.
  69. ```
  70. % args=(ls --color /)
  71. % echo $args(1)
  72. ls
  73. % echo $args(2)
  74. --color
  75. % echo $#args
  76. 3
  77. % $args
  78. bin dev home lost+found mnt proc run srv swap tmp var
  79. boot etc lib media opt root sbin storage sys usr
  80. % args=("foo bar" baz)
  81. % touch $args
  82. % ls
  83. baz 'foo bar'
  84. ```
  85. Much better, right? One simple change eliminates the need for quoting virtually
  86. everywhere. Strings can contain spaces and nothing melts down.
  87. Let me run down the remaining examples from Mark's post and demonstrate their
  88. non-importance in rc. First, regarding $\*, it just does what you expect.
  89. ```
  90. % cat yell.rc
  91. #!/bin/rc
  92. shift
  93. echo I am about to run $* now!!!
  94. exec $*
  95. % ls *.jpg
  96. 'bite me.jpg'
  97. % ./yell.rc ls *.jpg
  98. I am about to run ls bite me.jpg now!!!
  99. 'bite me.jpg'
  100. ```
  101. Note also that there is no need to quote the arguments to "echo" here. Also note
  102. the use of shift; $\* includes $0 in rc.
  103. Finally, let's rewrite Mark's "lastdl" program in rc and show how it works fine
  104. in rc's interactive mode.
  105. ```
  106. #!/bin/rc
  107. cd $HOME/downloads
  108. echo $HOME/downloads/`{ls -t | head -n1}
  109. ```
  110. Its use at the command line works just fine without quotes.
  111. ```
  112. % file `{lastdl}
  113. /home/sircmpwn/downloads/test image.jpg: JPEG image data, JFIF standard 1.01,
  114. aspect ratio, density 1x1, segment length 16, baseline, precision 8,
  115. 5000x2813, components 3
  116. ```
  117. Just for fun, here's another version of this rc script that renames files with
  118. spaces to without, like the last example in Mark's post:
  119. ```
  120. #!/bin/rc
  121. cd $HOME/downloads
  122. last=`{ls -t | head -n1}
  123. if (~ $last '* *') {
  124. newname=`{echo $last | tr ' \t' '_'}
  125. mv $last $HOME/downloads/$newname
  126. last=$newname
  127. }
  128. echo $HOME/downloads/$last
  129. ```
  130. The only quotes to be found are those which escape the wildcard match testing
  131. for a space in the string.[^2] Not bad, right? Like Plan 9's rc, my shell
  132. imagines a new set of primitives for shells, then starts from the ground up and
  133. builds a shell which works better in most respects while still being very
  134. simple. Most of the problems that have long plagued us with respect to sh, bash,
  135. etc, are solved in a simple package with rc, alongside a nice interactive mode
  136. reminiscent of the best features of fish.
  137. rc is a somewhat complete shell today, but there is a bit more work to be done
  138. before it's ready for 1.0, most pressingly with respect to signal handling and
  139. job control, alongside a small bit of polish and easier features to implement
  140. (such as subshells, IFS, etc). Some features which are likely to be omitted, at
  141. least for 1.0, include logical comparisons and arithmetic expansion (for which
  142. /bin/test and /bin/dc are recommended respectively). Of course, rc is destined
  143. to become the primary shell of the [Ares operating system](https://ares-os.org)
  144. project that I've been working on, but I have designed it to work on Unix as
  145. well.
  146. Check it out!
  147. [^2]: This is a bit of a fib. In fact, globbing is disabled when processing the
  148. args of the ~ built-in. However, the quotes are, ironically, required to
  149. escape the space between the \* characters, so it's one argument rather than
  150. two.