logo

drewdevault.com

[mirror] blog and personal website of Drew DeVault git clone https://hacktivis.me/git/mirror/drewdevault.com.git

generating-ioctls.md (15951B)


  1. ---
  2. title: A Hare code generator for finding ioctl numbers
  3. date: 2022-05-14
  4. ---
  5. Modern Unix derivatives have this really bad idea called [ioctl]. It's a
  6. function which performs arbitrary operations on a file descriptor. It is
  7. essentially the kitchen sink of modern Unix derivatives, particularly Linux, in
  8. which they act almost like a second set of extra syscalls. For example, to get
  9. the size of the terminal window, you use an ioctl specific to TTY file
  10. descriptors:
  11. [ioctl]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/ioctl.html
  12. ```hare
  13. let wsz = rt::winsize { ... };
  14. match (rt::ioctl(fd, rt::TIOCGWINSZ, &wsz: *void)) {
  15. case let e: rt::errno =>
  16. switch (e: int) {
  17. case rt::EBADFD =>
  18. return errors::invalid;
  19. case rt::ENOTTY =>
  20. return errors::unsupported;
  21. case =>
  22. abort("Unexpected error from ioctl");
  23. };
  24. case int =>
  25. return ttysize {
  26. rows = wsz.ws_row,
  27. columns = wsz.ws_col,
  28. };
  29. };
  30. ```
  31. This code performs the ioctl syscall against the provided file descriptor "fd",
  32. using the "TIOCGWINSZ" operation, and setting the parameter to a pointer to a
  33. winsize structure. There are thousands of ioctls provided by Linux, and each of
  34. them is assigned a constant like TIOCGWINSZ (0x5413). Some constants, including
  35. this one, are assigned somewhat arbitrarily. However, some are assigned with
  36. some degree of structure.
  37. Consider for instance the ioctl TUNSETOWNER, which is used for tun/tap network
  38. devices. This ioctl is assigned the number 0x400454cc, but this is not selected
  39. arbitrarily. It's assigned with a macro, which we can find in
  40. /usr/include/linux/if_tun.h:
  41. ```c
  42. #define TUNSETOWNER _IOW('T', 204, int)
  43. ```
  44. The \_IOW macro, along with similar ones like \_IO, \_IOR, and \_IOWR, are
  45. defined in /usr/include/asm-generic/ioctl.h. They combine this letter, number,
  46. and parameter type (or rather its size), and the direction (R, W, WR, or
  47. neither), OR'd together into an unsigned 32-bit number:
  48. ```c
  49. #define _IOC_WRITE 1U
  50. #define _IOC_TYPECHECK(t) (sizeof(t))
  51. #define _IOC(dir,type,nr,size) \
  52. (((dir) << _IOC_DIRSHIFT) | \
  53. ((type) << _IOC_TYPESHIFT) | \
  54. ((nr) << _IOC_NRSHIFT) | \
  55. ((size) << _IOC_SIZESHIFT))
  56. #define _IOW(type,nr,size) _IOC(_IOC_WRITE,(type),(nr),(_IOC_TYPECHECK(size)))
  57. ```
  58. It would be useful to define ioctl numbers in a similar fashion for Hare
  59. programs. However, Hare lacks macros, so we cannot re-implement this in exactly
  60. the same manner. Instead, we can use code generation.
  61. *[Hare](https://harelang.org) is a new systems programming language I've been
  62. working on for a couple of years. Check out the [announcement][0] for more
  63. detail.*
  64. [0]: https://harelang.org/blog/2022-04-25-announcing-hare/
  65. Again using the tun interface as an example, our goal is to turn the following
  66. input file:
  67. ```hare
  68. type sock_filter = struct {
  69. code: u16,
  70. jt: u8,
  71. jf: u8,
  72. k: u32,
  73. };
  74. type sock_fprog = struct {
  75. length: u16,
  76. filter: *sock_filter,
  77. };
  78. def TUNSETNOCSUM: u32 = @_IOW('T', 200, int);
  79. def TUNSETDEBUG: u32 = @_IOW('T', 201, int);
  80. def TUNSETIFF: u32 = @_IOW('T', 202, int);
  81. def TUNSETPERSIST: u32 = @_IOW('T', 203, int);
  82. def TUNSETOWNER: u32 = @_IOW('T', 204, int);
  83. def TUNSETLINK: u32 = @_IOW('T', 205, int);
  84. def TUNSETGROUP: u32 = @_IOW('T', 206, int);
  85. def TUNGETFEATURES: u32 = @_IOR('T', 207, uint);
  86. def TUNSETOFFLOAD: u32 = @_IOW('T', 208, uint);
  87. def TUNSETTXFILTER: u32 = @_IOW('T', 209, uint);
  88. def TUNGETIFF: u32 = @_IOR('T', 210, uint);
  89. def TUNGETSNDBUF: u32 = @_IOR('T', 211, int);
  90. def TUNSETSNDBUF: u32 = @_IOW('T', 212, int);
  91. def TUNATTACHFILTER: u32 = @_IOW('T', 213, sock_fprog);
  92. def TUNDETACHFILTER: u32 = @_IOW('T', 214, sock_fprog);
  93. def TUNGETVNETHDRSZ: u32 = @_IOR('T', 215, int);
  94. def TUNSETVNETHDRSZ: u32 = @_IOW('T', 216, int);
  95. def TUNSETQUEUE: u32 = @_IOW('T', 217, int);
  96. def TUNSETIFINDEX: u32 = @_IOW('T', 218, uint);
  97. def TUNGETFILTER: u32 = @_IOR('T', 219, sock_fprog);
  98. def TUNSETVNETLE: u32 = @_IOW('T', 220, int);
  99. def TUNGETVNETLE: u32 = @_IOR('T', 221, int);
  100. def TUNSETVNETBE: u32 = @_IOW('T', 222, int);
  101. def TUNGETVNETBE: u32 = @_IOR('T', 223, int);
  102. def TUNSETSTEERINGEBPF: u32 = @_IOR('T', 224, int);
  103. def TUNSETFILTEREBPF: u32 = @_IOR('T', 225, int);
  104. def TUNSETCARRIER: u32 = @_IOW('T', 226, int);
  105. def TUNGETDEVNETNS: u32 = @_IO('T', 227);
  106. ```
  107. Into the following output file:
  108. ```hare
  109. type sock_filter = struct {
  110. code: u16,
  111. jt: u8,
  112. jf: u8,
  113. k: u32,
  114. };
  115. type sock_fprog = struct {
  116. length: u16,
  117. filter: *sock_filter,
  118. };
  119. def TUNSETNOCSUM: u32 = 0x400454c8;
  120. def TUNSETDEBUG: u32 = 0x400454c9;
  121. def TUNSETIFF: u32 = 0x400454ca;
  122. def TUNSETPERSIST: u32 = 0x400454cb;
  123. def TUNSETOWNER: u32 = 0x400454cc;
  124. def TUNSETLINK: u32 = 0x400454cd;
  125. def TUNSETGROUP: u32 = 0x400454ce;
  126. def TUNGETFEATURES: u32 = 0x800454cf;
  127. def TUNSETOFFLOAD: u32 = 0x400454d0;
  128. def TUNSETTXFILTER: u32 = 0x400454d1;
  129. def TUNGETIFF: u32 = 0x800454d2;
  130. def TUNGETSNDBUF: u32 = 0x800454d3;
  131. def TUNSETSNDBUF: u32 = 0x400454d4;
  132. def TUNATTACHFILTER: u32 = 0x401054d5;
  133. def TUNDETACHFILTER: u32 = 0x401054d6;
  134. def TUNGETVNETHDRSZ: u32 = 0x800454d7;
  135. def TUNSETVNETHDRSZ: u32 = 0x400454d8;
  136. def TUNSETQUEUE: u32 = 0x400454d9;
  137. def TUNSETIFINDEX: u32 = 0x400454da;
  138. def TUNGETFILTER: u32 = 0x801054db;
  139. def TUNSETVNETLE: u32 = 0x400454dc;
  140. def TUNGETVNETLE: u32 = 0x800454dd;
  141. def TUNSETVNETBE: u32 = 0x400454de;
  142. def TUNGETVNETBE: u32 = 0x800454df;
  143. def TUNSETSTEERINGEBPF: u32 = 0x800454e0;
  144. def TUNSETFILTEREBPF: u32 = 0x800454e1;
  145. def TUNSETCARRIER: u32 = 0x400454e2;
  146. def TUNGETDEVNETNS: u32 = 0x54e3;
  147. ```
  148. I wrote the [ioctlgen] tool for this purpose, and since it demonstrates a number
  149. of interesting Hare features, I thought it would make for a cool blog post. This
  150. program must do the following things:
  151. [ioctlgen]: https://git.sr.ht/~sircmpwn/hare/tree/master/item/cmd/ioctlgen/main.ha
  152. - Scan through the file looking for @\_IO\* constructs
  153. - Parse these @\_IO\* constructs
  154. - Determine the size of the type specified by the third parameter
  155. - Compute the ioctl number based on these inputs
  156. - Write the computed constant to the output
  157. - Pass everything else through unmodified
  158. The implementation begins thusly:
  159. ```hare
  160. let ioctlre: regex::regex = regex::regex { ... };
  161. let typedefre: regex::regex = regex::regex { ... };
  162. @init fn init() void = {
  163. ioctlre = regex::compile(`@(_IO[RW]*)\((.*)\)`)!;
  164. typedefre = regex::compile(`^(export )?type `)!;
  165. };
  166. @fini fn fini() void = {
  167. regex::finish(&ioctlre);
  168. regex::finish(&typedefre);
  169. };
  170. ```
  171. This sets aside two regular expressions: one that identifies type aliases (so
  172. that we can parse them to determine their size later), and one that identifies
  173. our @\_IO\* pseudo-macros. I also defined some types to store each of the
  174. details necessary to compute the ioctl assignment:
  175. ```hare
  176. type dir = enum u32 {
  177. IO = 0,
  178. IOW = 1,
  179. IOR = 2,
  180. IOWR = IOW | IOR,
  181. };
  182. type ioctl = (dir, rune, u32, const nullable *types::_type);
  183. ```
  184. Hare's standard library includes tools for parsing and analyzing Hare programs
  185. in the [hare namespace]. We'll need to use these to work with types in this
  186. program. At the start of the program, we initialize a "type store" from
  187. hare::types, which provides a mechanism with which Hare types can be processed
  188. and stored. The representation of Hare types varies depending on the
  189. architecture (for example, pointer types have different sizes on 32-bit and
  190. 64-bit systems), so we have to specify the architecture we want. In the future
  191. it will be necessary to make this configurable, but for now I just hard-coded
  192. x86\_64:
  193. [hare namespace]: https://docs.harelang.org/hare
  194. ```hare
  195. const store = types::store(types::x86_64, null, null);
  196. defer types::store_free(store);
  197. ```
  198. The two "null" parameters are not going to be used here, but are designed to
  199. facilitate evaluating expressions in type definitions, such as `[8 * 16]int`.
  200. Leaving them null is permissible, but disables the ability to do this sort of
  201. thing.
  202. Following this, we enter a loop which processes the input file line-by-line,
  203. testing each line against our regular expressions and doing some logic on them
  204. if they match. Let's start with the code for handling new types:
  205. ```hare
  206. for (true) {
  207. const line = match (bufio::scanline(os::stdin)!) {
  208. case io::EOF =>
  209. break;
  210. case let line: []u8 =>
  211. yield strings::fromutf8(line);
  212. };
  213. defer free(line);
  214. if (regex::test(&typedefre, line)!) {
  215. bufio::unreadrune(os::stdin, '\n');
  216. bufio::unread(os::stdin, strings::toutf8(line));
  217. loadtype(store);
  218. continue;
  219. };
  220. // ...to be continued...
  221. ```
  222. If we encounter a line which matches our type declaration regular expression,
  223. then we unread that line back into the (buffered) standard input stream, then
  224. call this "loadtype" function to parse and load it into the type store.
  225. ```ha
  226. fn loadtype(store: *types::typestore) void = {
  227. const tee = io::tee(os::stdin, os::stdout);
  228. const lex = lex::init(&tee, "<ioctl>");
  229. const decl = match (parse::decl(&lex)) {
  230. case let err: parse::error =>
  231. fmt::fatal("Error parsing type declaration:",
  232. parse::strerror(err));
  233. case let decl: ast::decl =>
  234. yield decl;
  235. };
  236. const tdecl = decl.decl as []ast::decl_type;
  237. if (len(tdecl) != 1) {
  238. fmt::fatal("Multiple type declarations are unsupported");
  239. };
  240. const tdecl = tdecl[0];
  241. const of = types::lookup(store, &tdecl._type)!;
  242. types::newalias(store, tdecl.ident, of);
  243. };
  244. ```
  245. Hare includes a Hare lexer and parser in the standard library, which we're
  246. making use of here. The first thing we do is use [io::tee] to copy any data the
  247. parser reads into stdout, passing it through to the output file. Then we set up
  248. a lexer and parse the type declaration. A type declaration looks something like
  249. this:
  250. [io::tee]: https://docs.harelang.org/io#tee
  251. ```hare
  252. type sock_fprog = struct {
  253. length: u16,
  254. filter: *sock_filter,
  255. };
  256. ```
  257. The types::lookup call looks up the struct type, and newalias creates a new
  258. type alias based on that type with the given name (sock\_filter). Adding this to
  259. the type store will let us resolve the type when we encounter it later on, for
  260. example in this line:
  261. ```hare
  262. def TUNGETFILTER: u32 = @_IOR('T', 219, sock_fprog);
  263. ```
  264. Back to the main loop, we have another regex test to check if we're looking at a
  265. line with one of these pseudo-macros:
  266. ```hare
  267. let groups = match (regex::find(&ioctlre, line)!) {
  268. case void =>
  269. fmt::println(line)!;
  270. continue;
  271. case let cap: []regex::capture =>
  272. yield cap;
  273. };
  274. defer free(groups);
  275. const dir = switch (groups[1].content) {
  276. case "_IO" =>
  277. yield dir::IO;
  278. case "_IOR" =>
  279. yield dir::IOR;
  280. case "_IOW" =>
  281. yield dir::IOW;
  282. case "_IOWR" =>
  283. yield dir::IOWR;
  284. case =>
  285. fmt::fatalf("Unknown ioctl direction {}", groups[1].content);
  286. };
  287. const ioctl = parseioctl(store, dir, groups[2].content);
  288. ```
  289. Recall that the regex from earlier is `@(_IO[RW]*)\((.*)\)`. This has two
  290. capture groups: one for "\_IO" or "\_IOW" and so on, and another for the list of
  291. "parameters" (the zeroth "capture group" is the entire match string). We use the
  292. first capture group to grab the ioctl direction, then we pass that into
  293. "parseioctl" along with the type store and the second capture group.
  294. This "parseioctl" function is kind of neat:
  295. ```hare
  296. fn parseioctl(store: *types::typestore, d: dir, params: str) ioctl = {
  297. const buf = bufio::fixed(strings::toutf8(params), io::mode::READ);
  298. const lex = lex::init(&buf, "<ioctl>");
  299. const rn = expect(&lex, ltok::LIT_RUNE).1 as rune;
  300. expect(&lex, ltok::COMMA);
  301. const num = expect(&lex, ltok::LIT_ICONST).1 as i64;
  302. if (d == dir::IO) {
  303. return (d, rn, num: u32, null);
  304. };
  305. expect(&lex, ltok::COMMA);
  306. const ty = match (parse::_type(&lex)) {
  307. case let ty: ast::_type =>
  308. yield ty;
  309. case let err: parse::error =>
  310. fmt::fatal("Error:", parse::strerror(err));
  311. };
  312. const ty = match (types::lookup(store, &ty)) {
  313. case let err: types::error =>
  314. fmt::fatal("Error:", types::strerror(err));
  315. case types::deferred =>
  316. fmt::fatal("Error: this tool does not support forward references");
  317. case let ty: const *types::_type =>
  318. yield ty;
  319. };
  320. return (d, rn, num: u32, ty);
  321. };
  322. fn expect(lex: *lex::lexer, want: ltok) lex::token = {
  323. match (lex::lex(lex)) {
  324. case let err: lex::error =>
  325. fmt::fatal("Error:", lex::strerror(err));
  326. case let tok: lex::token =>
  327. if (tok.0 != want) {
  328. fmt::fatalf("Error: unexpected {}", lex::tokstr(tok));
  329. };
  330. return tok;
  331. };
  332. };
  333. ```
  334. Here we've essentially set up a miniature parser based on a Hare lexer to parse
  335. our custom parameter list grammar. We create a [fixed reader] from the capture
  336. group string, then create a lexer based on this and start pulling tokens out of
  337. it. The first parameter is a rune, so we grab a LIT\_RUNE token and extract the
  338. Hare rune value from it, then after a COMMA token we repeat this with
  339. LIT\_ICONST to get the integer constant. dir::IO ioctls don't have a type
  340. parameter, so can return early in this case.
  341. [fixed reader]: https://docs.harelang.org/bufio#fixed
  342. Otherwise, we use [hare::parse::\_type] to parse the type parameter, producing
  343. a [hare::ast::\_type]. We then pass this to the type store to look up technical
  344. details about this type, such as its size, alignment, storage representation,
  345. and so on. This converts the AST type &mdash; which only has lexical information
  346. &mdash; into an actual type, including semantic information about the type.
  347. [hare::parse::\_type]: https://docs.harelang.org/hare/parse#_type
  348. [hare::ast::\_type]: https://docs.harelang.org/hare/ast#_type
  349. Equipped with this information, we can calculate the ioctl's assigned number:
  350. ```hare
  351. def IOC_NRBITS: u32 = 8;
  352. def IOC_TYPEBITS: u32 = 8;
  353. def IOC_SIZEBITS: u32 = 14; // XXX: Arch-specific
  354. def IOC_DIRBITS: u32 = 2; // XXX: Arch-specific
  355. def IOC_NRSHIFT: u32 = 0;
  356. def IOC_TYPESHIFT: u32 = IOC_NRSHIFT + IOC_NRBITS;
  357. def IOC_SIZESHIFT: u32 = IOC_TYPESHIFT + IOC_TYPEBITS;
  358. def IOC_DIRSHIFT: u32 = IOC_SIZESHIFT + IOC_SIZEBITS;
  359. fn ioctlno(io: *ioctl) u32 = {
  360. const typesz = match (io.3) {
  361. case let ty: const *types::_type =>
  362. yield ty.sz;
  363. case null =>
  364. yield 0z;
  365. };
  366. return (io.0: u32 << IOC_DIRSHIFT) |
  367. (io.1: u32 << IOC_TYPESHIFT) |
  368. (io.2 << IOC_NRSHIFT) |
  369. (typesz: u32 << IOC_SIZESHIFT);
  370. };
  371. ```
  372. And, back in the main loop, print it to the output:
  373. ```hare
  374. const prefix = strings::sub(line, 0, groups[1].start - 1);
  375. fmt::printfln("{}0x{:x};", prefix, ioctlno(&ioctl))!;
  376. ```
  377. Now we have successfully converted this:
  378. ```hare
  379. type sock_filter = struct {
  380. code: u16,
  381. jt: u8,
  382. jf: u8,
  383. k: u32,
  384. };
  385. type sock_fprog = struct {
  386. length: u16,
  387. filter: *sock_filter,
  388. };
  389. def TUNATTACHFILTER: u32 = @_IOW('T', 213, sock_fprog);
  390. ```
  391. Into this:
  392. ```hare
  393. def TUNATTACHFILTER: u32 = 0x401054d5;
  394. ```
  395. A quick C program verifies our result:
  396. ```c
  397. #include <linux/ioctl.h>
  398. #include <linux/if_tun.h>
  399. #include <stdio.h>
  400. int main() {
  401. printf("TUNATTACHFILTER: 0x%lx\n", TUNATTACHFILTER);
  402. }
  403. ```
  404. And:
  405. ```
  406. TUNATTACHFILTER: 0x401054d5
  407. ```
  408. It works!
  409. ---
  410. Critics may draw attention to the fact that we could have saved ourselves much
  411. of this work if Hare had first-class macros, but macros are not aligned with
  412. Hare's design goals, so an alternative solution is called for. This particular
  413. program is useful only in a small set of specific circumstances (and mainly for
  414. Hare developers themselves, less so for most users), but it solves the problem
  415. pretty neatly given the constraints it has to work within.
  416. I think this is a nice case study in a few useful features available from the
  417. Hare standard library. In addition to POSIX Extended Regular Expression support
  418. via the [regex] module, the [hare namespace] offers many tools to provide Hare
  419. programs with relatively deep insights into the language itself. We can use
  420. hare::lex to parse the custom grammar for our pseudo-macros, use hare::parse to
  421. parse type declarations, and use hare::types to compute the semantic details
  422. of each type. I also like many of the "little things" on display here, such as
  423. unreading data back into the buffered stdin reader, or using io::tee to copy
  424. data to stdout during parsing.
  425. [regex]: https://docs.harelang.org/regex
  426. I hope you found it interesting!