commit: b8c848ea159ca3f512769d4bdbee9ae9210dab6b
parent 0417fd418ae15536e7e4b29300c4656046922c66
Author: Drew DeVault <sir@cmpwn.com>
Date: Wed, 11 Aug 2021 14:11:14 +0200
Debugging a new PL
Diffstat:
1 file changed, 97 insertions(+), 0 deletions(-)
diff --git a/content/blog/Debugging-your-new-PL.md b/content/blog/Debugging-your-new-PL.md
@@ -0,0 +1,97 @@
+---
+title: Tips for debugging your new programming language
+date: 2021-08-11
+---
+
+Say you're building a new (compiled) programming language from scratch. You'll
+inevitably have to debug programs written in it, and worse, many of these
+problems will lead you into deep magic, as you uncover problems with your
+compiler or runtime. And as you find yourself diving into the arcane arts, your
+tools may be painfully lacking: how do you debug code written in a language for
+which debuggers and other tooling simply has not been written yet?
+
+In the implementation of my own programming language, I have faced this problem
+many times, and developed, by necessity, some skills around debugging with
+crippled tools that may lack an awareness of your language. Of course, the
+ultimate goal is to build out first-class debugging support, but we must have a
+language in the first place before we can write tools to debug it. If you find
+yourself in this situation, here are my recommendations.
+
+First, I'll echo the timeless words of Brian Kernighan:
+
+> The most effective debugging tool is still careful thought, coupled with
+> judiciously placed print statements.
+
+— Unix for Beginners (1979)
+
+Classic debugging techniques are of heightened importance in this environment:
+first seek to isolate the problem code, then to understand the problem code,
+then form, and test, a hypothesis — usually with a thoughtful print
+statement. Often, this is enough.
+
+Unfortunately, you may have to fire up gdb. gdb is often painful in the best of
+situations, but if you have to use it without debug symbols, you may find
+yourself shutting off the computer and seeking out rural real estate on which
+you can establish a new career in farming. If you can stomach it, I can offer
+some advice.
+
+First, you're going to be working in assembly, so make sure you're familiar with
+how it works. I would recommend keeping the ISA manual and your ABI
+specification handy. If you're smart and your language sets up stack frames
+properly (this is easy, do it early), you should at least have a backtrace,
+breakpoints at functions, and globals, though all of these will be untyped. You
+can write C casts to add some ad-hoc types to examine data in your process,
+like "print *(int *)$rdi".
+
+You'll also get used to the 'x' command, which eXamines memory. The command
+format is "x/NT", where N is the number of objects, and T is the object type: w
+for word (int), g for giantword (long), and h and b for halfword (short) and
+byte, respectively: "x/8g $rdi" will interpret rdi as an address where 8 longs
+are stored and print them out in hexadecimal. Of particular use is the "i"
+format, for "instruction", which will disassemble from the given address:
+
+```
+(gdb) x/8i $rip
+=> 0x5555555565c8 <rt.memcpy+4>: mov $0x0,%eax
+ 0x5555555565cd <rt.memcpy+9>: cmp %rdx,%rax
+ 0x5555555565d0 <rt.memcpy+12>: jae 0x5555555565df <rt.memcpy+27>
+ 0x5555555565d2 <rt.memcpy+14>: movzbl (%rsi,%rax,1),%ecx
+ 0x5555555565d6 <rt.memcpy+18>: mov %cl,(%rdi,%rax,1)
+ 0x5555555565d9 <rt.memcpy+21>: add $0x1,%rax
+ 0x5555555565dd <rt.memcpy+25>: jmp 0x5555555565cd <rt.memcpy+9>
+ 0x5555555565df <rt.memcpy+27>: leave
+```
+
+You can set breakpoints on the addresses you find here (e.g. "b
+*0x5555555565d0"), and step through one instruction at a time with the "si"
+command.
+
+I also tend to do some silly workarounds to avoid having to read too much
+assembly. If I want to set a breakpoint in some specific place, I might do the
+following:
+
+```hare
+fn _break() void = void;
+
+export fn main() void = {
+ // ...some code...
+
+ // Point of interest
+ let x = y[z * q];
+ _break();
+ somefunc(x);
+
+ // ...some code...
+};
+```
+
+Then I can instruct gdb to "b \_break" to break when this function is called,
+use "finish" to step out of the call frame, and I've arrived at the point of
+interest without having to rely on line numbers being available in my binary.
+
+Overall, this is a fairly miserable process which can take 5-10× longer
+than normal debugging, but with these tips you should at least find your
+problems solvable. Good motivation to develop better debugging tools for your
+new language, eh? A future blog post might go over some of this with DWARF and
+possibly how to teach gdb to understand a new language natively. In the
+meantime, good luck!