logo

drewdevault.com

[mirror] blog and personal website of Drew DeVault git clone https://hacktivis.me/git/mirror/drewdevault.com.git

Debugging-your-new-PL.md (4349B)


  1. ---
  2. title: Tips for debugging your new programming language
  3. date: 2021-08-11
  4. ---
  5. Say you're building a new (compiled) programming language from scratch. You'll
  6. inevitably have to debug programs written in it, and worse, many of these
  7. problems will lead you into deep magic, as you uncover problems with your
  8. compiler or runtime. And as you find yourself diving into the arcane arts, your
  9. tools may be painfully lacking: how do you debug code written in a language for
  10. which debuggers and other tooling simply has not been written yet?
  11. In the implementation of my own programming language, I have faced this problem
  12. many times, and developed, by necessity, some skills around debugging with
  13. crippled tools that may lack an awareness of your language. Of course, the
  14. ultimate goal is to build out first-class debugging support, but we must have a
  15. language in the first place before we can write tools to debug it. If you find
  16. yourself in this situation, here are my recommendations.
  17. First, I'll echo the timeless words of Brian Kernighan:
  18. > The most effective debugging tool is still careful thought, coupled with
  19. > judiciously placed print statements.
  20. — Unix for Beginners (1979)
  21. Classic debugging techniques are of heightened importance in this environment:
  22. first seek to isolate the problem code, then to understand the problem code,
  23. then form, and test, a hypothesis — usually with a thoughtful print
  24. statement. Often, this is enough.
  25. Unfortunately, you may have to fire up gdb. gdb is often painful in the best of
  26. situations, but if you have to use it without debug symbols, you may find
  27. yourself shutting off the computer and seeking out rural real estate on which
  28. you can establish a new career in farming. If you can stomach it, I can offer
  29. some advice.
  30. First, you're going to be working in assembly, so make sure you're familiar with
  31. how it works. I would recommend keeping the ISA manual and your ABI
  32. specification handy. If you're smart and your language sets up stack frames
  33. properly (this is easy, do it early), you should at least have a backtrace,
  34. breakpoints at functions, and globals, though all of these will be untyped. You
  35. can write C casts to add some ad-hoc types to examine data in your process,
  36. like "print *(int *)$rdi".
  37. You'll also get used to the 'x' command, which eXamines memory. The command
  38. format is "x/NT", where N is the number of objects, and T is the object type: w
  39. for word (int), g for giantword (long), and h and b for halfword (short) and
  40. byte, respectively: "x/8g $rdi" will interpret rdi as an address where 8 longs
  41. are stored and print them out in hexadecimal. Of particular use is the "i"
  42. format, for "instruction", which will disassemble from the given address:
  43. ```
  44. (gdb) x/8i $rip
  45. => 0x5555555565c8 <rt.memcpy+4>: mov $0x0,%eax
  46. 0x5555555565cd <rt.memcpy+9>: cmp %rdx,%rax
  47. 0x5555555565d0 <rt.memcpy+12>: jae 0x5555555565df <rt.memcpy+27>
  48. 0x5555555565d2 <rt.memcpy+14>: movzbl (%rsi,%rax,1),%ecx
  49. 0x5555555565d6 <rt.memcpy+18>: mov %cl,(%rdi,%rax,1)
  50. 0x5555555565d9 <rt.memcpy+21>: add $0x1,%rax
  51. 0x5555555565dd <rt.memcpy+25>: jmp 0x5555555565cd <rt.memcpy+9>
  52. 0x5555555565df <rt.memcpy+27>: leave
  53. ```
  54. You can set breakpoints on the addresses you find here (e.g. "b
  55. *0x5555555565d0"), and step through one instruction at a time with the "si"
  56. command.
  57. I also tend to do some silly workarounds to avoid having to read too much
  58. assembly. If I want to set a breakpoint in some specific place, I might do the
  59. following:
  60. ```hare
  61. fn _break() void = void;
  62. export fn main() void = {
  63. // ...some code...
  64. // Point of interest
  65. let x = y[z * q];
  66. _break();
  67. somefunc(x);
  68. // ...some code...
  69. };
  70. ```
  71. Then I can instruct gdb to "b \_break" to break when this function is called,
  72. use "finish" to step out of the call frame, and I've arrived at the point of
  73. interest without having to rely on line numbers being available in my binary.
  74. Overall, this is a fairly miserable process which can take 5-10&times; longer
  75. than normal debugging, but with these tips you should at least find your
  76. problems solvable. Good motivation to develop better debugging tools for your
  77. new language, eh? A future blog post might go over some of this with DWARF and
  78. possibly how to teach gdb to understand a new language natively. In the
  79. meantime, good luck!