computing-truths.txt (7363B)
- Computing Tips/Truths
- This is inspired by some "CS Falsehoods" that you can find over the internet[1].
- As I do not like having to invert everything I prefer to use truths directly instead, I believe it is more readable this way, at least it makes it easier for me to describe the issues.
- I would love to be proved wrong or shown doubts on any of this, thanks a lot if you do.
- - This list isn't absolute truth (but I hope me and you exercise enough doubts)
- - Yourself/Academia/Internet/Myself/Chuck Norris/Any Computer/… cannot be always right
- - We've yet to discover enough of how computers works to know what we're doing (reminder: it's not because you know how to pile bricks that you're a good architect)
- - You can sometimes detect after parsing if a program will or will not end (finite-automata / loop with no end condition)
- - You cannot detect for all programs if they will or will not end ("The halting problem")
- - Most programs can be made to crash (and under most Operating Systems it's All)
- - You can render an operating system unusable (Denial-of-Service) probably more easily than you think, even with some restrictions:
- - Fixed: Ping of Death, most but not all security issues, …
- - Mitigated: Forkbombs, using up all memory, using up all of a filesystem (be careful with logs), eating the limit of file descriptors/PIDs/… of the current user or root, …
- - Cryptography isn't some magic fairy dust to make something secure (it can actually make it worse)
- - There is no magic solutions to make something secure, but there is good practices
- - You will need actual debugging tools (gdb/lldb, dtrace, ping, tcpdump/wireshark, …), learn them
- - Open-Source isn't as different from Libre/Free-software as some people say
- - There is only few CS/IT fields where some advanced math knowledge is actually useful (not to say that you shouldn't give it a shot anyway)
- - There isn't a programming language where a human can directly build everything they want
- - All programming languages interpreters/compilers aren't written in C (see Go, Haskell, Rust, …)
- - One does not simply knows how copyright laws works on the internet (hint: that means worldwide)
- - Public Domain in most of Europe is only valid for "Copyright Expiration" (consider licences like CC0)
- - A large number of lines does not means that the programmer was efficient
- - A small program does not means that it runs efficiently
- - A version number isn't a good indicator of quality
- - Data decays eternally
- - You need threat models for your security
- - You do not have a turing machine: RAM is finite, memory allocation can fail
- - Your programs aren't alone, try to be a good neighbor
- - Always online isn't
- - Serverless/Passwordless isn't (also define things by what they are)
- - Working Replacements aren't (in)direct extensions of what they replace
- ## Unique IDs
- So called "Unique IDs" aren't always unique:
- - A lot of "Unique IDs" can be spoofed or badly generated/stored (quite common for MAC Addresses)
- - If you assign IDs sequentially it means that you end up with enumeration and a lack of plausible-deniability and can lead to uniqueness issues if you restore storage from an previous point in time, this should be strongly avoided in internet applications
- - In the case of UUIDs, they can be reasonably trusted but be careful on how you use them:
- - "nil" UUID (entirely zero) is valid
- - version 1 should be avoided in settings where time isn't linear (can easily jump backwards, always at the same date on boot, …)
- - version 3 (MD5) and 5 (SHA-1) obviously shouldn't be used as security credentials
- - version 4 is pure random and should be avoided when you have reliable time and can get a large sample size (Birthday problem)
- - In decentralized settings consider FlakeIDs, 128-bits k-ordered IDs: 64-bits of milliseconds since UNIX epoch, 48-bits for the node-ID, 16-bits of random/sequence
- Implementation: <https://git.pleroma.social/pleroma/elixir-libraries/flake_id>
- Note on the node-ID: Consider generating a random ID at launch or installation; using another Unique ID like a MAC Address has uniqueness issues and privacy issues
- It's also assumed that a node can reasonably assert if a FlakeID in it's own namespace was already used.
- ## Correctness
- - Asserting correction is hard (I do not believe that languages like F* actually do solve this entirely)
- - An audit only shows the problems found, it cannot found all of them
- - You cannot build an entirely secure machine
- ## Parsing
- - You cannot parse non-regular languages/formats with one regex
- - You cannot validate an email address using one regex (Well, other than `.+`)
- - You cannot validate an email address prior to it's actual usage (ie. UTF-8 might not be supported at destination)
- - You cannot write a parser for all version numbers
- - Some regex languages aren't (regular)
- - You can parse non-regular languages/formats with multiple regexes (see lex/yacc, awk, perl, …)
- - You cannot parse human formats reasonably well
- - PEMDAS isn't enough to express Math evaluation priorities
- - Client-side input pattern guards aren't a replacement to server-side validators, if you need to pick one, always pick server-side
- - Consider lua patterns instead of some regex dialect
- - Prefer non-backtracking regex dialects (Go/RE2) instead of Perl's dialect
- ## Unix maintains bugs
- - You cannot trust PIDs to point to the same program at two different times (making uptime part of the IDs would help a lot though)
- ## Standards
- "The nice thing about standards is that you have so many to choose from." — Andrew S. Tanenbaum; Computer Networks, 2nd ed., p. 254.
- - POSIX is not followed by most Unix systems (in fact the certifications seems off/wrong)
- - W3C standards are written first; implementations (maybe) later
- - 2+ implementations first; description in RFCs later
- - Most standards aren't enforced or properly certified (even with considering Correctness issues)
- ## Time
- - Time sometimes goes backwards (And I believe that you need to be able to set an earlier date during runtime for most operating systems)
- - You probably cannot represent time (on earth) in one format correctly (as in, ISO8601 isn't absolute and <a href="https://www.jwz.org/blog/2018/02/timezones/">iCal is a gigantic mess</a>)
- - In a human representation: Hours aren't between 0 and 23, Minutes aren't between 0 and 59, Seconds aren't between 0 and 59
- - We do not have a universal way to count seconds, see RFC3339's introduction and https://what-if.xkcd.com/26/
- ## Internet lies
- - Files in lossy formats do not loose more data over time than lossless ones
- ## Git
- - You can rewrite history (git-filter-branch, git-rebase, git-reset, ...)
- ## Names
- - Consider using something comparable to tag URIs (RFC 4151) to avoid having a namespace that can get conflicts or be burned
- - (Human) names are weird, verify any assumption you can have and then still use some doubt. I wrote <https://hacktivis.me/articles/real%20names> for a reason.
- ## See Also
- I would also highly recommend watching https://www.destroyallsoftware.com/talks/ideology which has:
- - Functionnal Programming can be efficient enough
- - Garbage Collection can be efficient enough
- - NULL isn't the only way to represent absence
- - Type systems cannot replace tests
- - Tests cannot replace types
- 1: https://www.netmeister.org/blog/cs-falsehoods.html