logo

blog

My website can't be that messy, right? git clone https://hacktivis.me/git/blog.git

computing-truths.txt (7363B)


  1. Computing Tips/Truths
  2. This is inspired by some "CS Falsehoods" that you can find over the internet[1].
  3. As I do not like having to invert everything I prefer to use truths directly instead, I believe it is more readable this way, at least it makes it easier for me to describe the issues.
  4. I would love to be proved wrong or shown doubts on any of this, thanks a lot if you do.
  5. - This list isn't absolute truth (but I hope me and you exercise enough doubts)
  6. - Yourself/Academia/Internet/Myself/Chuck Norris/Any Computer/… cannot be always right
  7. - We've yet to discover enough of how computers works to know what we're doing (reminder: it's not because you know how to pile bricks that you're a good architect)
  8. - You can sometimes detect after parsing if a program will or will not end (finite-automata / loop with no end condition)
  9. - You cannot detect for all programs if they will or will not end ("The halting problem")
  10. - Most programs can be made to crash (and under most Operating Systems it's All)
  11. - You can render an operating system unusable (Denial-of-Service) probably more easily than you think, even with some restrictions:
  12. - Fixed: Ping of Death, most but not all security issues, …
  13. - Mitigated: Forkbombs, using up all memory, using up all of a filesystem (be careful with logs), eating the limit of file descriptors/PIDs/… of the current user or root, …
  14. - Cryptography isn't some magic fairy dust to make something secure (it can actually make it worse)
  15. - There is no magic solutions to make something secure, but there is good practices
  16. - You will need actual debugging tools (gdb/lldb, dtrace, ping, tcpdump/wireshark, …), learn them
  17. - Open-Source isn't as different from Libre/Free-software as some people say
  18. - There is only few CS/IT fields where some advanced math knowledge is actually useful (not to say that you shouldn't give it a shot anyway)
  19. - There isn't a programming language where a human can directly build everything they want
  20. - All programming languages interpreters/compilers aren't written in C (see Go, Haskell, Rust, …)
  21. - One does not simply knows how copyright laws works on the internet (hint: that means worldwide)
  22. - Public Domain in most of Europe is only valid for "Copyright Expiration" (consider licences like CC0)
  23. - A large number of lines does not means that the programmer was efficient
  24. - A small program does not means that it runs efficiently
  25. - A version number isn't a good indicator of quality
  26. - Data decays eternally
  27. - You need threat models for your security
  28. - You do not have a turing machine: RAM is finite, memory allocation can fail
  29. - Your programs aren't alone, try to be a good neighbor
  30. - Always online isn't
  31. - Serverless/Passwordless isn't (also define things by what they are)
  32. - Working Replacements aren't (in)direct extensions of what they replace
  33. ## Unique IDs
  34. So called "Unique IDs" aren't always unique:
  35. - A lot of "Unique IDs" can be spoofed or badly generated/stored (quite common for MAC Addresses)
  36. - If you assign IDs sequentially it means that you end up with enumeration and a lack of plausible-deniability and can lead to uniqueness issues if you restore storage from an previous point in time, this should be strongly avoided in internet applications
  37. - In the case of UUIDs, they can be reasonably trusted but be careful on how you use them:
  38. - "nil" UUID (entirely zero) is valid
  39. - version 1 should be avoided in settings where time isn't linear (can easily jump backwards, always at the same date on boot, …)
  40. - version 3 (MD5) and 5 (SHA-1) obviously shouldn't be used as security credentials
  41. - version 4 is pure random and should be avoided when you have reliable time and can get a large sample size (Birthday problem)
  42. - In decentralized settings consider FlakeIDs, 128-bits k-ordered IDs: 64-bits of milliseconds since UNIX epoch, 48-bits for the node-ID, 16-bits of random/sequence
  43. Implementation: <https://git.pleroma.social/pleroma/elixir-libraries/flake_id>
  44. Note on the node-ID: Consider generating a random ID at launch or installation; using another Unique ID like a MAC Address has uniqueness issues and privacy issues
  45. It's also assumed that a node can reasonably assert if a FlakeID in it's own namespace was already used.
  46. ## Correctness
  47. - Asserting correction is hard (I do not believe that languages like F* actually do solve this entirely)
  48. - An audit only shows the problems found, it cannot found all of them
  49. - You cannot build an entirely secure machine
  50. ## Parsing
  51. - You cannot parse non-regular languages/formats with one regex
  52. - You cannot validate an email address using one regex (Well, other than `.+`)
  53. - You cannot validate an email address prior to it's actual usage (ie. UTF-8 might not be supported at destination)
  54. - You cannot write a parser for all version numbers
  55. - Some regex languages aren't (regular)
  56. - You can parse non-regular languages/formats with multiple regexes (see lex/yacc, awk, perl, …)
  57. - You cannot parse human formats reasonably well
  58. - PEMDAS isn't enough to express Math evaluation priorities
  59. - Client-side input pattern guards aren't a replacement to server-side validators, if you need to pick one, always pick server-side
  60. - Consider lua patterns instead of some regex dialect
  61. - Prefer non-backtracking regex dialects (Go/RE2) instead of Perl's dialect
  62. ## Unix maintains bugs
  63. - You cannot trust PIDs to point to the same program at two different times (making uptime part of the IDs would help a lot though)
  64. ## Standards
  65. "The nice thing about standards is that you have so many to choose from." — Andrew S. Tanenbaum; Computer Networks, 2nd ed., p. 254.
  66. - POSIX is not followed by most Unix systems (in fact the certifications seems off/wrong)
  67. - W3C standards are written first; implementations (maybe) later
  68. - 2+ implementations first; description in RFCs later
  69. - Most standards aren't enforced or properly certified (even with considering Correctness issues)
  70. ## Time
  71. - Time sometimes goes backwards (And I believe that you need to be able to set an earlier date during runtime for most operating systems)
  72. - You probably cannot represent time (on earth) in one format correctly (as in, ISO8601 isn't absolute and <a href="https://www.jwz.org/blog/2018/02/timezones/">iCal is a gigantic mess</a>)
  73. - In a human representation: Hours aren't between 0 and 23, Minutes aren't between 0 and 59, Seconds aren't between 0 and 59
  74. - We do not have a universal way to count seconds, see RFC3339's introduction and https://what-if.xkcd.com/26/
  75. ## Internet lies
  76. - Files in lossy formats do not loose more data over time than lossless ones
  77. ## Git
  78. - You can rewrite history (git-filter-branch, git-rebase, git-reset, ...)
  79. ## Names
  80. - Consider using something comparable to tag URIs (RFC 4151) to avoid having a namespace that can get conflicts or be burned
  81. - (Human) names are weird, verify any assumption you can have and then still use some doubt. I wrote <https://hacktivis.me/articles/real%20names> for a reason.
  82. ## See Also
  83. I would also highly recommend watching https://www.destroyallsoftware.com/talks/ideology which has:
  84. - Functionnal Programming can be efficient enough
  85. - Garbage Collection can be efficient enough
  86. - NULL isn't the only way to represent absence
  87. - Type systems cannot replace tests
  88. - Tests cannot replace types
  89. 1: https://www.netmeister.org/blog/cs-falsehoods.html