logo

drewdevault.com

[mirror] blog and personal website of Drew DeVault git clone https://hacktivis.me/git/mirror/drewdevault.com.git

awk-is-the-coolest-tool-you-dont-know.gmi (6437B)


  1. awk, named for its authors Aho, Weinberger, and Kernighan, is a very cool little tool that you know exists and is installed on your system, but you have never bothered to learn how to use. I’m here to tell you that you really ought to!
  2. If I stop for a moment to ponder the question, “what is the coolest tool in Unix?”, the immediate answer is awk. If I insist on pondering it for longer, giving each tool a moment for fair evaluation, the answer is still awk.¹ There are few tools as perfectly suited to their problem as awk is.
  3. I’m not going to tell you what awk is, because there are already plenty of other resources for that. If you are totally unfamiliar with awk, then here’s a very brief summary:
  4. > awk reads a plaintext file as a list of newline-separated records of whitespace-separated columns. It then matches each line to a set of rules (defined by regular expressions) and then performs the actions listed by each matching rule (such as summing or averaging a column, reformatting the output, or doing any other number of things).
  5. >
  6. > In short, awk is a domain-specific language which reads the kind of files you probably have a lot of already, then mutates them or computes something interesting from them.
  7. => https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html "awk" in the POSIX specification
  8. Instead of teaching you how to use it, I’m just going to tell you about some things I’ve used awk for, in an effort to convince you that it’s cool.
  9. Please hold while I hastily run history | grep awk. Wait, no, it would be more ironic to run history | awk '/awk/ { print $0 }' instead. One sec…
  10. ## Rewriting references in comments
  11. I had some code like this:
  12. ```
  13. // Duplicates a [dirent] object. Call [dirent_free] to get rid of it later.
  14. export fn dirent_dup(e: *dirent) dirent = { // ...
  15. ```
  16. It needed to become this:
  17. ```
  18. // Duplicates a [[dirent]] object. Call [[dirent_free]] to get rid of it later.
  19. export fn dirent_dup(e: *dirent) dirent = { // ...
  20. ```
  21. So the following awk command looks for lines which are a comment, then globally substitutes a regex for another string:
  22. ```
  23. awk '/^\/\/.*/ { gsub(/\[[a-zA-Z:]+\]/, "[&]") } { print $0 }' < input
  24. ```
  25. In this manner I used it as a kind of modified sed which only operates on certain lines.
  26. ## Extracting data from one script to use in another script
  27. I have a script which includes something like this:
  28. ```
  29. modules="ascii
  30. bufio
  31. bytes
  32. compress_flate
  33. compress_zlib
  34. crypto_blake2b
  35. crypto_math
  36. crypto_random
  37. crypto_md5
  38. crypto_sha1
  39. crypto_sha256
  40. crypto_sha512
  41. # ...
  42. uuid"
  43. ```
  44. I wanted to extract this list of names from the script, and replace _ with :: for all of them. awk to the rescue!
  45. ```
  46. # Yes, I am entirely aware that this is a hack
  47. modules=$(awk '
  48. /^modules="/ { sub(/.*=\"/, ""); gsub(/_/, "::"); print $1; mods = 1 }
  49. /^[a-z][a-z0-9_]+$/ { if (mods == 1) { gsub(/_/, "::"); print $1 } }
  50. /^[a-z][a-z0-9_]+"$/ { if (mods == 1) { mods = 0; sub(/"/, ""); gsub(/_/, "::"); print $1 } }
  51. ' < scripts/gen-stdlib)
  52. ```
  53. ## Adding syntax highlighting to patches
  54. My email client, aerc, lets you pipe emails into an arbitrary command to format them nicely for displaying in your terminal. One kind of email I get often is a patch, with a diff dropped directly into the email. I wrote this awk script to add ANSI colors to such an email:
  55. ```
  56. BEGIN {
  57. bright = "\x1B[1m"
  58. red = "\x1B[31m"
  59. green = "\x1B[32m"
  60. cyan = "\x1B[36m"
  61. reset = "\x1B[0m"
  62. hit_diff = 0
  63. }
  64. {
  65. if (hit_diff == 0) {
  66. # Strip carriage returns from line
  67. gsub(/\r/, "", $0)
  68. if ($0 ~ /^diff /) {
  69. hit_diff = 1;
  70. print bright $0 reset
  71. } else if ($0 ~ /^.*\|.*(\+|-)/) {
  72. left = substr($0, 0, index($0, "|")-1)
  73. right = substr($0, index($0, "|"))
  74. gsub(/-+/, red "&" reset, right)
  75. gsub(/\++/, green "&" reset, right)
  76. print left right
  77. } else {
  78. print $0
  79. }
  80. } else {
  81. # Strip carriage returns from line
  82. gsub(/\r/, "", $0)
  83. if ($0 ~ /^-/) {
  84. print red $0 reset
  85. } else if ($0 ~ /^\+/) {
  86. print green $0 reset
  87. } else if ($0 ~ /^ /) {
  88. print $0
  89. } else if ($0 ~ /^@@ (-[0-9]+,[0-9]+ \+[0-9]+,[0-9]+) @@.*/) {
  90. sub(/^@@ (-[0-9]+,[0-9]+ \+[0-9]+,[0-9]+) @@/, cyan "&" reset)
  91. print $0
  92. } else {
  93. print bright $0 reset
  94. }
  95. }
  96. }
  97. ```
  98. ## Pulling a specific column out of another command
  99. This is most basic use of awk. git ls-tree returns something like this:
  100. ```
  101. 100644 blob aa61a6c84fa215178b560e2bddcdcb18bf62ccc7 .build.yml
  102. 100644 blob 73ab8769f93bbbd5c4b69d33c2fa86329d05bc85 .gitignore
  103. 100644 blob 65d4d3ae9206f664e72c49ffed1489414852e637 LICENSE
  104. 100644 blob 6fd05a7d17471026df258d9931309a19ac286c5f README.md
  105. 040000 tree dfdc471efbd87a131fd7fe41706debdb48411ebe assets
  106. 100644 blob ede1a81e5b44031d95e315985ed7e7831067d609 config.toml
  107. 040000 tree a8ab3f0c2db376725d480c673e289d654d289acc content
  108. 040000 tree 7b1a4966c4143bcea991e9f77620cb4fda887d66 layouts
  109. 040000 tree e4dfc4f7500e111652aa7880002252b47239a2d0 static
  110. 100644 blob f31711885de4cd43571ee633b553016b766d3ec1 webring-in.template
  111. ```
  112. Recently I was looking for comments on the first line of any file in my git repository, so:
  113. git ls-tree -r HEAD | awk '{ print $4 }' | xargs -n1 sed 1q | grep '//' | less
  114. I include this to demonstrate some restraint. The xargs, sed, and grep commands in this pipeline could all have been incorporated into awk, but it’s simpler not to.
  115. ## Numbering lines from stdin
  116. Sometimes I have a file and I want it to have line numbers. So, I wrote a little shell one-liner that does the job:
  117. ```
  118. $ cat ~/bin/lineno
  119. exec awk '{ print NR "\t" $0 }'
  120. $ lineno < /etc/os-release
  121. 1 NAME="Alpine Linux"
  122. 2 ID=alpine
  123. 3 VERSION_ID=3.14.0_alpha20210212
  124. 4 PRETTY_NAME="Alpine Linux edge"
  125. 5 HOME_URL="https://alpinelinux.org/"
  126. 6 BUG_REPORT_URL="https://bugs.alpinelinux.org/"
  127. ```
  128. ## In conclusion
  129. You’re doing yourself a disservice if you don’t know how to use awk. Awk is only applicable to a certain kind of problem, but it’s a problem you’ll encounter more often than you think. Plus, once you get thinking in awk terms, you’ll find yourself subtly formatting your data in awk-friendly ways :) Learn it!
  130. ¹ Though special mention goes to ar (I dunno), cut (it’s useful), dd (for being a silly wart), ed (for not being installed on anyone’s system by default anymore, which pisses me off), and fort77 (for being specified by POSIX for some reason).